Things on this page are fragmentary and immature notes/thoughts of the author. Please read with your own judgement!
Tips and Traps¶
LazyCsvReader is more limited compared to CsvReader. CsvReader support specifying schema while LazyCsvReader does not.
An empty filed is parsed as
null
instead of an empty string by default. And there is no way to change this behavior at this time. Please refer to this issue
Cast Types of Columns in Pandas
Things on this page are fragmentary and immature notes/thoughts of the author. Please read with your own judgement!
Tips and Traps¶
You can use the method
Series.astype
to cast the type of a series.Series.astype(str)
convertsNaN
s to the string literalnan
. This is often NOT what people want. A better way is to useSeries.astype(object)
Convert Pandas DataFrame to Other Format
Things on this page are fragmentary and immature notes/thoughts of the author. Please read with your own judgement!
Coalesce and Repartition in Spark DataFrame
DataFrame Implementations in Python
Things on this page are fragmentary and immature notes/thoughts of the author. Please read with your own judgement!
Tips and Traps¶
Alternatives to pandas for Small Data¶
- Polars is a blazingly fast DataFrames library implemented in Rust using Apache Arrow as memory model. It is the best replacement of pandas for small data at this time. Notice that Polars support multithreading and lazy computation but it cannot handle data larger than memory at this time.