Ben Chuanlong Du's Blog

It is never too late to learn.

Dataframe for JVM

Things on this page are fragmentary and immature notes/thoughts of the author. Please read with your own judgement!

Spark DataFrame

Spark DataFrame is a great implementation of distributed DataFrame, if you don't mind having dependency on Spark. It can be used in a non-distributed way of course. Spark DataFrame is mostly friendly for Scala (Spark/Scala) and Python (PySpark), and can be used in Jupyter/Lab notebooks.


Via Python/Java interfaces (jpype, py4j or pyjnius).


Tablesaw is currently the most mature non-distributed DataFrame implementation for JVM languages. However, its usability is still far behind Spark DataFrame and Python pandas DataFrame.


krangl is a DataFrame implementation in Kotlin.
