Things on this page are fragmentary and immature notes/thoughts of the author. Please read with your own judgement!
Tips and Traps¶
Polars is a blazingly fast DataFrames library implemented in Rust using Apache Arrow as memory model. It supports multithreading and lazy computation.
The Rust crate polars has many features . Be sure to include features which are required for your use cases. Below are some commonly useful features.
- lazy: Turns on support of lazy computation (LazyFrame, col, Expr, etc.).
Notice that
col
andExpr
are only for LazyFrame. - parquet: Turns on support of Parquet format (read/write Parquet files).
- lazy: Turns on support of lazy computation (LazyFrame, col, Expr, etc.).
Notice that
Polars CANNOT handle data larger than memory at this time (even though this might change in future).
Polars does not provide a way to scan Parquet files row by row currently. You can use the Parquet crate to achive this.
In [2]:
:timing
:sccache 1
:dep polars = { version = "0.26.1", features = ["lazy", "parquet"] }
Out[2]:
In [6]:
use polars::df;
use polars::prelude::*;
use polars::datatypes::DataType;
use std::fs::File;
use std::io::BufWriter;
use std::io::Write;
DataFrame.head¶
In [10]:
frame.head(None)
Out[10]:
DataFrame.shape¶
In [9]:
frame.shape()
Out[9]:
DataFrame.height¶
In [10]:
frame.height()
Out[10]:
In [11]:
frame.width()
Out[11]:
DataFrame.apply¶
In [27]:
fn as_u64(id: &Series) -> Series {
id.cast(&DataType::UInt64).unwrap()
}
In [29]:
df.apply("id0", as_u64);
df.apply("id1", as_u64);
df.apply("id2", as_u64);
df
Out[29]:
In [36]:
let f = File::create("j.parquet").expect("Unable to create file");
let mut bfw = BufWriter::new(f);
let pw = ParquetWriter::new(bfw).with_compression(ParquetCompression::Snappy);
In [38]:
pw.finish(&mut df);
Loop Through Rows¶
In [19]:
{
let columns = df.get_columns();
for i in 0..5 {
print!("{i}: ");
columns.iter().for_each(|s: &Series| {
print!("{:?} ", s.get(i));
});
println!("");
}
}
Out[19]:
References¶
In [ ]: