Things on this page are fragmentary and immature notes/thoughts of the author. Please read with your own judgement!
Tips and Traps¶
LazyCsvReader is more limited compared to CsvReader. CsvReader support specifying schema while LazyCsvReader does not.
An empty filed is parsed as
null
instead of an empty string by default. And there is no way to change this behavior at this time. Please refer to this issue for more discussions. Characters other than empty are NOT parsed asnull
by default. However, parsing special characters intonull
is supported via the APICsvReader::with_null_values
.
In [2]:
:timing
:sccache 1
:dep polars = { version = "0.26.1", features = ["lazy", "parquet", "dtype-full"] }
Out[2]:
Out[2]:
In [3]:
use polars::df;
use polars::prelude::*;
use polars::datatypes::DataType;
use std::fs::File;
use std::io::BufWriter;
use std::io::Write;
Out[3]:
CsvReader and DataFrame¶
In [8]:
let mut s = Schema::new();
s.with_column("column_1".into(), DataType::UInt8);
s.with_column("column_2".into(), DataType::UInt8);
s.with_column("column_3".into(), DataType::UInt8);
s.with_column("column_4".into(), DataType::UInt16);
s.with_column("column_5".into(), DataType::Utf8);
s
Out[8]:
Out[8]:
In [9]:
let df = CsvReader::from_path("rank53_j0_j0.csv")?
.has_header(false)
.with_dtypes(Some(&s))
.with_null_values(None)
.finish()?;
df
Out[9]:
Out[9]:
In [10]:
df.filter(
&df.column("column_5")?.equal("")?
)?
Out[10]:
Out[10]:
In [11]:
df.filter(
&df.column("column_5")?.equal("NA")?
)?
Out[11]:
Out[11]:
In [12]:
df.filter(
&df.column("column_5")?.is_null()
)?
Out[12]:
Out[12]:
LazyCsvReader and LazyFrame¶
In [20]:
let df: LazyFrame = LazyCsvReader::new("rank53_j0_j0.csv")
.has_header(false)
.with_null_values(None)
.finish()?;
df.collect()?
Out[20]:
Out[20]:
In [ ]: