Ben Chuanlong Du's Blog

It is never too late to learn.

Data Frame Implementations in Rust

Things on this page are fragmentary and immature notes/thoughts of the author. Please read with your own judgement!

Polars

Polars is a fast multi-threaded DataFrame library in Rust and Python.

datafusion

datafusion is an extensible query execution framework, written in Rust, that uses Apache Arrow as its in-memory format …

Hands on the Polars Crate in Rust

Things on this page are fragmentary and immature notes/thoughts of the author. Please read with your own judgement!

Tips and Traps

  1. Polars is a blazingly fast DataFrames library implemented in Rust using Apache Arrow as memory model. It supports multithreading and lazy computation.

  2. The Rust crate polars has many features . Be sure to include features which are required for your use cases. Below are some commonly useful features.

Hands on the Polars Library in Python

Things on this page are fragmentary and immature notes/thoughts of the author. Please read with your own judgement!

Tips and Traps

  1. polars.DataFrame.unique and polars.Series.unique do not maintain the original order by default. To maintain the original order, pass the option maintain_order=True.

Polars

Polars is a blazingly fast DataFrames library implemented in Rust using Apache Arrow as memory model.

Read and Write Parquet Files in Rust

Things on this page are fragmentary and immature notes/thoughts of the author. Please read with your own judgement!

There are a few crates in Rust which can help read and write Parquet files, among which Polars is the best one. As a matter of fact, polars is a DataFrame …

Read CSV Files Using Polars in Rust

Things on this page are fragmentary and immature notes/thoughts of the author. Please read with your own judgement!

Tips and Traps

  1. LazyCsvReader is more limited compared to CsvReader. CsvReader support specifying schema while LazyCsvReader does not.

  2. An empty filed is parsed as null instead of an empty string by default. And there is no way to change this behavior at this time. Please refer to this issue