Ben Chuanlong Du's Blog

It is never too late to learn.

DataFrame Implementations in Python

Things on this page are fragmentary and immature notes/thoughts of the author. Please read with your own judgement!

Tips and Traps

Alternatives to pandas for Small Data

  1. Polars is a blazingly fast DataFrames library implemented in Rust using Apache Arrow as memory model. It is the best replacement of pandas for small data at this time. Notice that Polars support multithreading and lazy computation but it cannot handle data larger than memory at this time.

Hands on the Python module dask

Installation

  1. You have to install the complete version of Dask (using the command pip3 install dask[complete]) if you need support of extended memory (for handling big data) and schedulers (for performance). The default installation version (pip3 install dask) of Dask does not include those features out-of-box.