Ben Chuanlong Du's Blog

It is never too late to learn.

Read and Write Parquet Files in Rust

Things on this page are fragmentary and immature notes/thoughts of the author. Please read with your own judgement!

There are a few crates in Rust which can help read and write Parquet files, among which Polars is the best one. As a matter of fact, polars is a DataFrame …

Spark Issue: Runtimeerror: Arrow Legacy IPC Format Is Not Supported

Things on this page are fragmentary and immature notes/thoughts of the author. Please read with your own judgement!

Symptoms

RuntimeError: Arrow legacy IPC format is not supported in PySpark, please unset ARROW_PRE_0_15_IPC_FORMAT

Possible Causes

You are using PySpark 3.0+ with one (or both) of the following options.

--conf …

Python Modules for Date and Time

Things on this page are fragmentary and immature notes/thoughts of the author. Please read with your own judgement!

datetime

dateutil

Useful extensions to the standard Python datetime features

dateparser

python parser for human readable dates

arrow

Better dates & times for Python.

monthdelta

Data Types in Different Programming Languages

Things on this page are fragmentary and immature notes/thoughts of the author. Please read with your own judgement!

Data Type C C++ Rust Java Python numpy pyarrow Spark SQL SQL
8 bit integer short (16-bit) int8_t i8 short (16-bit) int (arbitrary precision) int8 TinyInt …

Tips on Apache Arrow

Things on this page are fragmentary and immature notes/thoughts of the author. Please read with your own judgement!

[Feather vs Parquet]https://github.com/wesm/feather/issues/188

References

https://github.com/wesm/feather