Things on this page are fragmentary and immature notes/thoughts of the author. Please read with your own judgement!

Tips and Traps¶

polars.DataFrame.unique and polars.Series.unique do not maintain the original order by default. To maintain the original order, pass the option maintain_order=True.

Polars ¶

Polars is a blazingly fast DataFrames library implemented in Rust using Apache Arrow as memory model.

It is the best replacement of pandas for small data at this time.
Polars support multithreading and lazy computation.
Polars CANNOT handle data larger than memory at this time (even though this might change in future).

Comparison with pandas DataFrame¶

Polars intentionally leaves out the concept of (row) index.
There are no methods such as loc and iloc in Polars. You can use df.get_column / df.[col], df.get_columns / df.[[col1, col2]] to access columns.
Similar to pandas DataFrame, chaining access works but chaining assignment doesn't work. To assign value of an element, use df[row_index, col_name] = val instead. However, notice that this is inefficient as it updates the whole column under the hood. If you have to update values of a column in a Polars DataFrame, do NOT loop through each cell to update it. Instead, create a Series which contains updated values and then update the column only once. For more discussions, please refer to Efficient way to update a single cell of a Polars DataFrame? .
Polars DataFrame provides APIs DataFrame.from_pandas and DataFrame.to_pandas to convert between Polars/pandas DataFrames.
Polars' APIs for parsing CSV files is not as flexible as pandas's. Lucky that we can parse CSV files using pandas and then convert pandas DataFrmaes into Polars DataFrames.

In [1]:

!pip3 install --user polars

Collecting polars
  Downloading polars-0.16.2-cp37-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (15.2 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 15.2/15.2 MB 28.7 MB/s eta 0:00:0000:0100:01
Installing collected packages: polars
Successfully installed polars-0.16.2

[notice] A new release of pip available: 22.3.1 -> 23.0
[notice] To update, run: python3 -m pip install --upgrade pip

In [2]:

import itertools as it
import polars as pl

Series¶

In [38]:

[m for m in dir(pl.Series) if not m.startswith("_")]

Out[38]:

['abs',
 'alias',
 'all',
 'any',
 'append',
 'apply',
 'arccos',
 'arccosh',
 'arcsin',
 'arcsinh',
 'arctan',
 'arctanh',
 'arg_max',
 'arg_min',
 'arg_sort',
 'arg_true',
 'arg_unique',
 'argsort',
 'arr',
 'bin',
 'cast',
 'cat',
 'ceil',
 'chunk_lengths',
 'cleared',
 'clip',
 'clip_max',
 'clip_min',
 'clone',
 'cos',
 'cosh',
 'cummax',
 'cummin',
 'cumprod',
 'cumsum',
 'cumulative_eval',
 'describe',
 'diff',
 'dot',
 'drop_nans',
 'drop_nulls',
 'dt',
 'dtype',
 'entropy',
 'estimated_size',
 'ewm_mean',
 'ewm_std',
 'ewm_var',
 'exp',
 'explode',
 'extend_constant',
 'fill_nan',
 'fill_null',
 'filter',
 'flags',
 'floor',
 'get_chunks',
 'has_validity',
 'hash',
 'head',
 'inner_dtype',
 'interpolate',
 'is_boolean',
 'is_datelike',
 'is_duplicated',
 'is_empty',
 'is_finite',
 'is_first',
 'is_float',
 'is_in',
 'is_infinite',
 'is_nan',
 'is_not_nan',
 'is_not_null',
 'is_null',
 'is_numeric',
 'is_sorted',
 'is_unique',
 'is_utf8',
 'item',
 'kurtosis',
 'len',
 'limit',
 'log',
 'log10',
 'max',
 'mean',
 'median',
 'min',
 'mode',
 'n_chunks',
 'n_unique',
 'name',
 'nan_max',
 'nan_min',
 'new_from_index',
 'null_count',
 'pct_change',
 'peak_max',
 'peak_min',
 'product',
 'quantile',
 'rank',
 'rechunk',
 'reinterpret',
 'rename',
 'reshape',
 'reverse',
 'rolling_apply',
 'rolling_max',
 'rolling_mean',
 'rolling_median',
 'rolling_min',
 'rolling_quantile',
 'rolling_skew',
 'rolling_std',
 'rolling_sum',
 'rolling_var',
 'round',
 'sample',
 'search_sorted',
 'series_equal',
 'set',
 'set_at_idx',
 'set_sorted',
 'shape',
 'shift',
 'shift_and_fill',
 'shrink_dtype',
 'shrink_to_fit',
 'shuffle',
 'sign',
 'sin',
 'sinh',
 'skew',
 'slice',
 'sort',
 'sqrt',
 'std',
 'str',
 'struct',
 'sum',
 'tail',
 'take',
 'take_every',
 'tan',
 'tanh',
 'time_unit',
 'to_arrow',
 'to_dummies',
 'to_frame',
 'to_list',
 'to_numpy',
 'to_pandas',
 'to_physical',
 'top_k',
 'unique',
 'unique_counts',
 'value_counts',
 'var',
 'view',
 'zip_with']

In [36]:

s = pl.Series([1, 2, 3])
s

Out[36]:

shape: (3,)


i64
1
2
3

In [37]:

s[0] = 100
s

Out[37]:

shape: (3,)


i64
100
2
3

DataFrame¶

In [25]:

[m for m in dir(pl.DataFrame) if not m.startswith("_")]

Out[25]:

['apply',
 'cleared',
 'clone',
 'columns',
 'describe',
 'drop',
 'drop_in_place',
 'drop_nulls',
 'dtypes',
 'estimated_size',
 'explode',
 'extend',
 'fill_nan',
 'fill_null',
 'filter',
 'find_idx_by_name',
 'fold',
 'frame_equal',
 'get_column',
 'get_columns',
 'glimpse',
 'groupby',
 'groupby_dynamic',
 'groupby_rolling',
 'hash_rows',
 'head',
 'height',
 'hstack',
 'insert_at_idx',
 'interpolate',
 'is_duplicated',
 'is_empty',
 'is_unique',
 'item',
 'iterrows',
 'join',
 'join_asof',
 'lazy',
 'limit',
 'max',
 'mean',
 'median',
 'melt',
 'merge_sorted',
 'min',
 'n_chunks',
 'n_unique',
 'null_count',
 'partition_by',
 'pearson_corr',
 'pipe',
 'pivot',
 'product',
 'quantile',
 'rechunk',
 'rename',
 'replace',
 'replace_at_idx',
 'reverse',
 'row',
 'rows',
 'sample',
 'schema',
 'select',
 'shape',
 'shift',
 'shift_and_fill',
 'shrink_to_fit',
 'slice',
 'sort',
 'std',
 'sum',
 'tail',
 'take_every',
 'to_arrow',
 'to_dict',
 'to_dicts',
 'to_dummies',
 'to_numpy',
 'to_pandas',
 'to_series',
 'to_struct',
 'transpose',
 'unique',
 'unnest',
 'unstack',
 'upsample',
 'var',
 'vstack',
 'width',
 'with_column',
 'with_columns',
 'with_row_count',
 'write_avro',
 'write_csv',
 'write_ipc',
 'write_json',
 'write_ndjson',
 'write_parquet']

In [4]:

df = pl.read_csv("https://j.mp/iriscsv")
df

Out[4]:

shape: (150, 5)

sepal_length	sepal_width	petal_length	petal_width	species
f64	f64	f64	f64	str
5.1	3.5	1.4	0.2	"setosa"
4.9	3.0	1.4	0.2	"setosa"
4.7	3.2	1.3	0.2	"setosa"
4.6	3.1	1.5	0.2	"setosa"
5.0	3.6	1.4	0.2	"setosa"
5.4	3.9	1.7	0.4	"setosa"
4.6	3.4	1.4	0.3	"setosa"
5.0	3.4	1.5	0.2	"setosa"
4.4	2.9	1.4	0.2	"setosa"
4.9	3.1	1.5	0.1	"setosa"
5.4	3.7	1.5	0.2	"setosa"
4.8	3.4	1.6	0.2	"setosa"
...	...	...	...	...
6.0	3.0	4.8	1.8	"virginica"
6.9	3.1	5.4	2.1	"virginica"
6.7	3.1	5.6	2.4	"virginica"
6.9	3.1	5.1	2.3	"virginica"
5.8	2.7	5.1	1.9	"virginica"
6.8	3.2	5.9	2.3	"virginica"
6.7	3.3	5.7	2.5	"virginica"
6.7	3.0	5.2	2.3	"virginica"
6.3	2.5	5.0	1.9	"virginica"
6.5	3.0	5.2	2.0	"virginica"
6.2	3.4	5.4	2.3	"virginica"
5.9	3.0	5.1	1.8	"virginica"

In [32]:

df["sepal_length"]

Out[32]:

shape: (150,)

sepal_length
f64
1000.0
4.9
4.7
4.6
5.0
5.4
4.6
5.0
4.4
4.9
5.4
4.8
...
6.0
6.9
6.7
6.9
5.8
6.8
6.7
6.7
6.3
6.5
6.2
5.9

Similar to pandas DataFrame, chaining assignment does NOT work!

In [35]:

df["sepal_length"][0] = 10000
df

Out[35]:

shape: (150, 5)

sepal_length	sepal_width	petal_length	petal_width	species
f64	f64	f64	f64	str
1000.0	3.5	1.4	0.2	"setosa"
4.9	3.0	1.4	0.2	"setosa"
4.7	3.2	1.3	0.2	"setosa"
4.6	3.1	1.5	0.2	"setosa"
5.0	3.6	1.4	0.2	"setosa"
5.4	3.9	1.7	0.4	"setosa"
4.6	3.4	1.4	0.3	"setosa"
5.0	3.4	1.5	0.2	"setosa"
4.4	2.9	1.4	0.2	"setosa"
4.9	3.1	1.5	0.1	"setosa"
5.4	3.7	1.5	0.2	"setosa"
4.8	3.4	1.6	0.2	"setosa"
...	...	...	...	...
6.0	3.0	4.8	1.8	"virginica"
6.9	3.1	5.4	2.1	"virginica"
6.7	3.1	5.6	2.4	"virginica"
6.9	3.1	5.1	2.3	"virginica"
5.8	2.7	5.1	1.9	"virginica"
6.8	3.2	5.9	2.3	"virginica"
6.7	3.3	5.7	2.5	"virginica"
6.7	3.0	5.2	2.3	"virginica"
6.3	2.5	5.0	1.9	"virginica"
6.5	3.0	5.2	2.0	"virginica"
6.2	3.4	5.4	2.3	"virginica"
5.9	3.0	5.1	1.8	"virginica"

You can slice by row and column at the same time.

In [28]:

df[0, "sepal_length"]

Out[28]:

5.1

In [65]:

dir(df)

Out[65]:

['__add__',
 '__annotations__',
 '__bool__',
 '__class__',
 '__contains__',
 '__copy__',
 '__deepcopy__',
 '__delattr__',
 '__dict__',
 '__dir__',
 '__doc__',
 '__eq__',
 '__format__',
 '__ge__',
 '__getattribute__',
 '__getitem__',
 '__getstate__',
 '__gt__',
 '__hash__',
 '__init__',
 '__init_subclass__',
 '__iter__',
 '__le__',
 '__len__',
 '__lt__',
 '__mod__',
 '__module__',
 '__mul__',
 '__ne__',
 '__new__',
 '__radd__',
 '__reduce__',
 '__reduce_ex__',
 '__repr__',
 '__rmul__',
 '__setattr__',
 '__setitem__',
 '__setstate__',
 '__sizeof__',
 '__str__',
 '__sub__',
 '__subclasshook__',
 '__truediv__',
 '__weakref__',
 '_accessors',
 '_comp',
 '_compare_to_non_df',
 '_compare_to_other_df',
 '_df',
 '_from_arrow',
 '_from_dict',
 '_from_dicts',
 '_from_numpy',
 '_from_pandas',
 '_from_pydf',
 '_from_records',
 '_ipython_key_completions_',
 '_pos_idx',
 '_pos_idxs',
 '_read_avro',
 '_read_csv',
 '_read_ipc',
 '_read_json',
 '_read_ndjson',
 '_read_parquet',
 '_repr_html_',
 'apply',
 'cleared',
 'clone',
 'columns',
 'describe',
 'drop',
 'drop_in_place',
 'drop_nulls',
 'dtypes',
 'estimated_size',
 'explode',
 'extend',
 'fill_nan',
 'fill_null',
 'filter',
 'find_idx_by_name',
 'fold',
 'frame_equal',
 'get_column',
 'get_columns',
 'glimpse',
 'groupby',
 'groupby_dynamic',
 'groupby_rolling',
 'hash_rows',
 'head',
 'height',
 'hstack',
 'insert_at_idx',
 'interpolate',
 'is_duplicated',
 'is_empty',
 'is_unique',
 'item',
 'iterrows',
 'join',
 'join_asof',
 'lazy',
 'limit',
 'max',
 'mean',
 'median',
 'melt',
 'merge_sorted',
 'min',
 'n_chunks',
 'n_unique',
 'null_count',
 'partition_by',
 'pearson_corr',
 'pipe',
 'pivot',
 'product',
 'quantile',
 'rechunk',
 'rename',
 'replace',
 'replace_at_idx',
 'reverse',
 'row',
 'rows',
 'sample',
 'schema',
 'select',
 'shape',
 'shift',
 'shift_and_fill',
 'shrink_to_fit',
 'slice',
 'sort',
 'std',
 'sum',
 'tail',
 'take_every',
 'to_arrow',
 'to_dict',
 'to_dicts',
 'to_dummies',
 'to_numpy',
 'to_pandas',
 'to_series',
 'to_struct',
 'transpose',
 'unique',
 'unnest',
 'unstack',
 'upsample',
 'var',
 'vstack',
 'width',
 'with_column',
 'with_columns',
 'with_row_count',
 'write_avro',
 'write_csv',
 'write_ipc',
 'write_json',
 'write_ndjson',
 'write_parquet']

In [29]:

df[0, "sepal_length"] = 1000
df

Out[29]:

shape: (150, 5)

sepal_length	sepal_width	petal_length	petal_width	species
f64	f64	f64	f64	str
1000.0	3.5	1.4	0.2	"setosa"
4.9	3.0	1.4	0.2	"setosa"
4.7	3.2	1.3	0.2	"setosa"
4.6	3.1	1.5	0.2	"setosa"
5.0	3.6	1.4	0.2	"setosa"
5.4	3.9	1.7	0.4	"setosa"
4.6	3.4	1.4	0.3	"setosa"
5.0	3.4	1.5	0.2	"setosa"
4.4	2.9	1.4	0.2	"setosa"
4.9	3.1	1.5	0.1	"setosa"
5.4	3.7	1.5	0.2	"setosa"
4.8	3.4	1.6	0.2	"setosa"
...	...	...	...	...
6.0	3.0	4.8	1.8	"virginica"
6.9	3.1	5.4	2.1	"virginica"
6.7	3.1	5.6	2.4	"virginica"
6.9	3.1	5.1	2.3	"virginica"
5.8	2.7	5.1	1.9	"virginica"
6.8	3.2	5.9	2.3	"virginica"
6.7	3.3	5.7	2.5	"virginica"
6.7	3.0	5.2	2.3	"virginica"
6.3	2.5	5.0	1.9	"virginica"
6.5	3.0	5.2	2.0	"virginica"
6.2	3.4	5.4	2.3	"virginica"
5.9	3.0	5.1	1.8	"virginica"

In [13]:

df.columns

Out[13]:

['sepal_length', 'sepal_width', 'petal_length', 'petal_width', 'species']

In [16]:

s = df.get_column("sepal_length")
s

Out[16]:

shape: (150,)

sepal_length
f64
5.1
4.9
4.7
4.6
5.0
5.4
4.6
5.0
4.4
4.9
5.4
4.8
...
6.0
6.9
6.7
6.9
5.8
6.8
6.7
6.7
6.3
6.5
6.2
5.9

In [19]:

s[0] = 1000
s

Out[19]:

shape: (150,)

sepal_length
f64
1000.0
4.9
4.7
4.6
5.0
5.4
4.6
5.0
4.4
4.9
5.4
4.8
...
6.0
6.9
6.7
6.9
5.8
6.8
6.7
6.7
6.3
6.5
6.2
5.9

In [20]:

df

Out[20]:

shape: (150, 5)

sepal_length	sepal_width	petal_length	petal_width	species
f64	f64	f64	f64	str
5.1	3.5	1.4	0.2	"setosa"
4.9	3.0	1.4	0.2	"setosa"
4.7	3.2	1.3	0.2	"setosa"
4.6	3.1	1.5	0.2	"setosa"
5.0	3.6	1.4	0.2	"setosa"
5.4	3.9	1.7	0.4	"setosa"
4.6	3.4	1.4	0.3	"setosa"
5.0	3.4	1.5	0.2	"setosa"
4.4	2.9	1.4	0.2	"setosa"
4.9	3.1	1.5	0.1	"setosa"
5.4	3.7	1.5	0.2	"setosa"
4.8	3.4	1.6	0.2	"setosa"
...	...	...	...	...
6.0	3.0	4.8	1.8	"virginica"
6.9	3.1	5.4	2.1	"virginica"
6.7	3.1	5.6	2.4	"virginica"
6.9	3.1	5.1	2.3	"virginica"
5.8	2.7	5.1	1.9	"virginica"
6.8	3.2	5.9	2.3	"virginica"
6.7	3.3	5.7	2.5	"virginica"
6.7	3.0	5.2	2.3	"virginica"
6.3	2.5	5.0	1.9	"virginica"
6.5	3.0	5.2	2.0	"virginica"
6.2	3.4	5.4	2.3	"virginica"
5.9	3.0	5.1	1.8	"virginica"

In [22]:

df.get_column("sepal_length")[0] = 2000
df

Out[22]:

shape: (150, 5)

sepal_length	sepal_width	petal_length	petal_width	species
f64	f64	f64	f64	str
5.1	3.5	1.4	0.2	"setosa"
4.9	3.0	1.4	0.2	"setosa"
4.7	3.2	1.3	0.2	"setosa"
4.6	3.1	1.5	0.2	"setosa"
5.0	3.6	1.4	0.2	"setosa"
5.4	3.9	1.7	0.4	"setosa"
4.6	3.4	1.4	0.3	"setosa"
5.0	3.4	1.5	0.2	"setosa"
4.4	2.9	1.4	0.2	"setosa"
4.9	3.1	1.5	0.1	"setosa"
5.4	3.7	1.5	0.2	"setosa"
4.8	3.4	1.6	0.2	"setosa"
...	...	...	...	...
6.0	3.0	4.8	1.8	"virginica"
6.9	3.1	5.4	2.1	"virginica"
6.7	3.1	5.6	2.4	"virginica"
6.9	3.1	5.1	2.3	"virginica"
5.8	2.7	5.1	1.9	"virginica"
6.8	3.2	5.9	2.3	"virginica"
6.7	3.3	5.7	2.5	"virginica"
6.7	3.0	5.2	2.3	"virginica"
6.3	2.5	5.0	1.9	"virginica"
6.5	3.0	5.2	2.0	"virginica"
6.2	3.4	5.4	2.3	"virginica"
5.9	3.0	5.1	1.8	"virginica"

In [17]:

type(s)

Out[17]:

polars.internals.series.series.Series

pl.all¶

In [22]:

comp.select(pl.all().all())

Out[22]:

shape: (1, 3)

j0	j1	ranks
bool	bool	bool
true	true	true

DataFrame.frame_equal ¶

Check whether a DataFrame equals to another DataFrame, elementwise.

In [26]:

df.filter((df["i0"] == 1) & (df["i1"] == 2) & (df["i2"] == 13))[
    ["j0", "j1", "ranks"]
].frame_equal(
    df.filter((df["i0"] == 1) & (df["i1"] == 2) & (df["i2"] == 26))[
        ["j0", "j1", "ranks"]
    ]
)

Out[26]:

True

In [3]:

df = pl.DataFrame(
    {
        "id": [0, 1, 2, 3, 4],
        "color": ["red", "green", "green", "red", "red"],
        "shape": ["square", "triangle", "square", "triangle", "square"],
    }
)
df

Out[3]:

shape: (5, 3)

id	color	shape
i64	str	str
0	"red"	"square"
1	"green"	"triangle"
2	"green"	"square"
3	"red"	"triangle"
4	"red"	"square"

In [5]:

df.filter(pl.col("sepal_length") > 5).groupby("species").sum()

Out[5]:

shape: (3, 5)

species	sepal_length	sepal_width	petal_length	petal_width
str	f64	f64	f64	f64
"versicolor"	281.9	131.8	202.9	63.3
"setosa"	116.9	81.7	33.2	6.1
"virginica"	324.5	146.2	273.1	99.6

In [7]:

df = pl.DataFrame(
    {
        "A": [1, 2, 3, 4, 5],
        "fruits": ["banana", "banana", "apple", "apple", "banana"],
        "B": [5, 4, 3, 2, 1],
        "cars": ["beetle", "audi", "beetle", "beetle", "beetle"],
    }
)
df

Out[7]:

A	fruits	B	cars
i64	str	i64	str
1	"banana"	5	"beetle"
2	"banana"	4	"audi"
3	"apple"	3	"beetle"
4	"apple"	2	"beetle"
5	"banana"	1	"beetle"

In [8]:

df.sort("fruits").select(
    [
        "fruits",
        "cars",
        pl.lit("fruits").alias("literal_string_fruits"),
        pl.col("B").filter(pl.col("cars") == "beetle").sum(),
        pl.col("A")
        .filter(pl.col("B") > 2)
        .sum()
        .over("cars")
        .alias("sum_A_by_cars"),  # groups by "cars"
        pl.col("A").sum().over("fruits").alias("sum_A_by_fruits"),  # groups by "fruits"
        pl.col("A")
        .reverse()
        .over("fruits")
        .flatten()
        .alias("rev_A_by_fruits"),  # groups by "fruits
        pl.col("A")
        .sort_by("B")
        .over("fruits")
        .flatten()
        .alias("sort_A_by_B_by_fruits"),  # groups by "fruits"
    ]
)

Out[8]:

fruits	cars	literal_string_fruits	B	sum_A_by_cars	sum_A_by_fruits	rev_A_by_fruits	sort_A_by_B_by_fruits
str	str	str	i64	i64	i64	i64	i64
"apple"	"beetle"	"fruits"	11	4	7	4	4
"apple"	"beetle"	"fruits"	11	4	7	3	3
"banana"	"beetle"	"fruits"	11	4	8	5	5
"banana"	"audi"	"fruits"	11	2	8	2	2
"banana"	"beetle"	"fruits"	11	4	8	1	1

In [54]:

df.to_dict("records")

Out[54]:

{'id': shape: (5,)
 Series: 'id' [i64]
 [
 	0
 	1
 	2
 	3
 	4
 ],
 'color': shape: (5,)
 Series: 'color' [str]
 [
 	"red"
 	"green"
 	"green"
 	"red"
 	"red"
 ],
 'shape': shape: (5,)
 Series: 'shape' [str]
 [
 	"square"
 	"triangle"
 	"square"
 	"triangle"
 	"square"
 ]}

In [56]:

df.to_dicts()

Out[56]:

[{'id': 0, 'color': 'red', 'shape': 'square'},
 {'id': 1, 'color': 'green', 'shape': 'triangle'},
 {'id': 2, 'color': 'green', 'shape': 'square'},
 {'id': 3, 'color': 'red', 'shape': 'triangle'},
 {'id': 4, 'color': 'red', 'shape': 'square'}]

In [55]:

dir(df)

Out[55]:

['__add__',
 '__annotations__',
 '__bool__',
 '__class__',
 '__contains__',
 '__copy__',
 '__deepcopy__',
 '__delattr__',
 '__dict__',
 '__dir__',
 '__doc__',
 '__eq__',
 '__format__',
 '__ge__',
 '__getattribute__',
 '__getitem__',
 '__getstate__',
 '__gt__',
 '__hash__',
 '__init__',
 '__init_subclass__',
 '__iter__',
 '__le__',
 '__len__',
 '__lt__',
 '__mod__',
 '__module__',
 '__mul__',
 '__ne__',
 '__new__',
 '__radd__',
 '__reduce__',
 '__reduce_ex__',
 '__repr__',
 '__rmul__',
 '__setattr__',
 '__setitem__',
 '__setstate__',
 '__sizeof__',
 '__str__',
 '__sub__',
 '__subclasshook__',
 '__truediv__',
 '__weakref__',
 '_accessors',
 '_comp',
 '_compare_to_non_df',
 '_compare_to_other_df',
 '_df',
 '_from_arrow',
 '_from_dict',
 '_from_dicts',
 '_from_numpy',
 '_from_pandas',
 '_from_pydf',
 '_from_records',
 '_ipython_key_completions_',
 '_pos_idx',
 '_pos_idxs',
 '_read_avro',
 '_read_csv',
 '_read_ipc',
 '_read_json',
 '_read_ndjson',
 '_read_parquet',
 '_repr_html_',
 'apply',
 'cleared',
 'clone',
 'columns',
 'describe',
 'drop',
 'drop_in_place',
 'drop_nulls',
 'dtypes',
 'estimated_size',
 'explode',
 'extend',
 'fill_nan',
 'fill_null',
 'filter',
 'find_idx_by_name',
 'fold',
 'frame_equal',
 'get_column',
 'get_columns',
 'glimpse',
 'groupby',
 'groupby_dynamic',
 'groupby_rolling',
 'hash_rows',
 'head',
 'height',
 'hstack',
 'insert_at_idx',
 'interpolate',
 'is_duplicated',
 'is_empty',
 'is_unique',
 'item',
 'iterrows',
 'join',
 'join_asof',
 'lazy',
 'limit',
 'max',
 'mean',
 'median',
 'melt',
 'merge_sorted',
 'min',
 'n_chunks',
 'n_unique',
 'null_count',
 'partition_by',
 'pearson_corr',
 'pipe',
 'pivot',
 'product',
 'quantile',
 'rechunk',
 'rename',
 'replace',
 'replace_at_idx',
 'reverse',
 'row',
 'rows',
 'sample',
 'schema',
 'select',
 'shape',
 'shift',
 'shift_and_fill',
 'shrink_to_fit',
 'slice',
 'sort',
 'std',
 'sum',
 'tail',
 'take_every',
 'to_arrow',
 'to_dict',
 'to_dicts',
 'to_dummies',
 'to_numpy',
 'to_pandas',
 'to_series',
 'to_struct',
 'transpose',
 'unique',
 'unnest',
 'unstack',
 'upsample',
 'var',
 'vstack',
 'width',
 'with_column',
 'with_columns',
 'with_row_count',
 'write_avro',
 'write_csv',
 'write_ipc',
 'write_json',
 'write_ndjson',
 'write_parquet']

In [60]:

ss = df.to_struct("ss")
ss

Out[60]:

shape: (5,)

ss
struct[3]
{0,"red","square"}
{1,"green","triangle"}
{2,"green","square"}
{3,"red","triangle"}
{4,"red","square"}

In [62]:

type(ss[0])

Out[62]:

dict

sort¶

DataFrame.sort is not in-place. It returns a new DataFrame.

In [63]:

?pl.DataFrame.sort

Signature:
pl.DataFrame.sort(
    self: 'DF',
    by: 'str | pli.Expr | Sequence[str] | Sequence[pli.Expr]',
    reverse: 'bool | list[bool]' = False,
    nulls_last: 'bool' = False,
) -> 'DF | DataFrame'
Docstring:
Sort the DataFrame by column.

Parameters
----------
by
    By which column to sort. Only accepts string.
reverse
    Reverse/descending sort.
nulls_last
    Place null values last. Can only be used if sorted by a single column.

Examples
--------
>>> df = pl.DataFrame(
...     {
...         "foo": [1, 2, 3],
...         "bar": [6.0, 7.0, 8.0],
...         "ham": ["a", "b", "c"],
...     }
... )
>>> df.sort("foo", reverse=True)
shape: (3, 3)
┌─────┬─────┬─────┐
│ foo ┆ bar ┆ ham │
│ --- ┆ --- ┆ --- │
│ i64 ┆ f64 ┆ str │
╞═════╪═════╪═════╡
│ 3   ┆ 8.0 ┆ c   │
├╌╌╌╌╌┼╌╌╌╌╌┼╌╌╌╌╌┤
│ 2   ┆ 7.0 ┆ b   │
├╌╌╌╌╌┼╌╌╌╌╌┼╌╌╌╌╌┤
│ 1   ┆ 6.0 ┆ a   │
└─────┴─────┴─────┘

**Sort by multiple columns.**
For multiple columns we can also use expression syntax.

>>> df.sort(
...     [pl.col("foo"), pl.col("bar") ** 2],
...     reverse=[True, False],
... )
shape: (3, 3)
┌─────┬─────┬─────┐
│ foo ┆ bar ┆ ham │
│ --- ┆ --- ┆ --- │
│ i64 ┆ f64 ┆ str │
╞═════╪═════╪═════╡
│ 3   ┆ 8.0 ┆ c   │
├╌╌╌╌╌┼╌╌╌╌╌┼╌╌╌╌╌┤
│ 2   ┆ 7.0 ┆ b   │
├╌╌╌╌╌┼╌╌╌╌╌┼╌╌╌╌╌┤
│ 1   ┆ 6.0 ┆ a   │
└─────┴─────┴─────┘
File:      ~/.local/lib/python3.10/site-packages/polars/internals/dataframe/frame.py
Type:      function

to_pandas¶

In [4]:

df = pl.DataFrame(
    {
        "foo": [1, 2, 3],
        "bar": [6, 7, 8],
        "ham": ["a", "b", "c"],
    }
)
dfp = df.to_pandas()
dfp

Out[4]:

	foo	bar	ham
0	1	6	a
1	2	7	b
2	3	8	c

from_pandas ¶

In [5]:

pl.from_pandas(dfp)

Out[5]:

shape: (3, 3)

foo	bar	ham
i64	i64	str
1	6	"a"
2	7	"b"
3	8	"c"

Ben Chuanlong Du's Blog

It is never too late to learn.

Hands on the Polars Library in Python

Tips and Traps¶

Polars ¶

Comparison with pandas DataFrame¶

Series¶

DataFrame¶

pl.all¶

DataFrame.frame_equal ¶

sort¶

to_pandas¶

from_pandas ¶

References¶

Comments

Tips and Traps¶

Polars¶

Comparison with pandas DataFrame¶

Series¶

DataFrame¶

pl.all¶

DataFrame.frame_equal¶

sort¶

to_pandas¶

from_pandas¶

References¶

Comments

Polars ¶

DataFrame.frame_equal ¶

from_pandas ¶