Ben Chuanlong Du's Blog

It is never too late to learn.

Filter a Polars DataFrame in Rust

Things on this page are fragmentary and immature notes/thoughts of the author. Please read with your own judgement!

Tips and Traps

  1. LazyFrame.filter filters rows using an Expr while DataFrame.filter filters rows using a mask of the type ChunkedArray<BooleanType>.

Filter a Polars LazyFrame in Rust

Things on this page are fragmentary and immature notes/thoughts of the author. Please read with your own judgement!

Tips and Traps

  1. LazyFrame.filter filters rows using an Expr while DataFrame.filter filters rows using a mask of the type ChunkedArray<BooleanType>.

Row-based Mapping and Filtering on DataFrames in Spark

Comments

Spark DataFrame is an alias to Dataset[Row]. Even though a Spark DataFrame is stored as Rows in a Dataset, built-in operations/functions (in org.apache.spark.sql.functions) for Spark DataFrame are Column-based. Sometimes, there might be transformations on a DataFrame that is hard to express as Column expressions but rather evey convenient to express as Row expressions. The traditional way to resolve this issue is to wrap the row-based function into a UDF. It is worthing knowing that Spark DataFrame supports map/flatMap APIs which works on Rows. They are still experimental as Spark 2.4.3. It is suggested that you stick to Column-based operations/functions until the Row-based methods mature.