In [1]:

import pandas as pd
import numpy as np
import datetime

Summary¶

There are many ways to filter rows of a pandas DataFrame. The slicing way is the universal way, however, the method DataFrame.query can be more convenient for simple filtering conditions and it is more efficient too.
DataFrame.where is not intened for filtering but rather a analog to case statement in SQL and if/else or ternary expressions.

In [2]:

df = pd.DataFrame(
    {
        "id": [1, 2, 3, 4, 5],
        "word": ["Mary", "is", "a", "fox", "."],
        "date": [datetime.date.today() + datetime.timedelta(days=i) for i in range(5)],
    }
)
df

Out[2]:

	id	word	date
0	1	Mary	2020-11-27
1	2	is	2020-11-28
2	3	a	2020-11-29
3	4	fox	2020-11-30
4	5	.	2020-12-01

`DataFrame.query`¶

In [3]:

df.query("id % 2 == 0")

Out[3]:

	id	word	date
1	2	is	2020-11-28
3	4	fox	2020-11-30

In [4]:

df.query("id in [3, 5, 7]")

Out[4]:

	id	word	date
2	3	a	2020-11-29
4	5	.	2020-12-01

In [29]:

x = [3, 5, 7]
df.query("id in @x")

Out[29]:

	id	word	date
2	3	a	2020-06-07
4	5	.	2020-06-09

In [30]:

df.query("word.str.len() == id")

Out[30]:

	id	word	date
1	2	is	2020-06-06

In [31]:

df.query("word.str.lower() == 'mary'")

Out[31]:

	id	word	date
0	1	Mary	2020-06-05

In [10]:

df.query("word.str.startswith('f')")

Out[10]:

	id	word	date
3	4	fox	2020-11-24

In [12]:

df[df.word.str.startswith("f")]

Out[12]:

	id	word	date
3	4	fox	2020-11-24

The below example is convenient way to compare a datetime column with a literal string.

In [32]:

df.query("date.astype('str') >= '2020-06-08'")

Out[32]:

	id	word	date
3	4	fox	2020-06-08
4	5	.	2020-06-09

Use Slicing¶

In [7]:

df[df.x % 2 == 0]

Out[7]:

	x	y
1	2	4
3	4	2

`DataFrame.where`¶

The method DataFrame.where is not intended for filtring. It is an analog to case statement in SQL and if/else and ternary expression in programming languages. The condition that DataFrame.where takes can be a bool Series/DataFrame, array-like, or callable.

In [14]:

df.where(lambda r: r.id % 2 == 0)

Out[14]:

	id	word
0	NaN	NaN
1	2.0	is
2	NaN	NaN
3	4.0	fox
4	NaN	NaN

In [18]:

df.where(lambda r: r.id % 2 == 0, df * 3)

Out[18]:

	id	word
0	3	MaryMaryMary
1	2	is
2	9	aaa
3	4	fox
4	15	...

In [9]:

df.where(lambda r: r.x % 2 == 0).dropna()

Out[9]:

	x	y
1	2.0	4.0
3	4.0	2.0

In [19]:

df.id.where(lambda x: x % 2 == 0)

Out[19]:

0    NaN
1    2.0
2    NaN
3    4.0
4    NaN
Name: id, dtype: float64

In [20]:

df.id.where(lambda x: x % 2 == 0, 0)

Out[20]:

0    0
1    2
2    0
3    4
4    0
Name: id, dtype: int64

References¶

High-Performance Pandas: eval() and query()

pandas.DataFrame.where

In [ ]:

Ben Chuanlong Du's Blog

It is never too late to learn.

Filter pandas DataFrames in Python

Summary¶

`DataFrame.query`¶

Use Slicing¶

`DataFrame.where`¶

References¶

Comments

Summary¶

DataFrame.query¶

Use Slicing¶

DataFrame.where¶

References¶

Comments

`DataFrame.query`¶

`DataFrame.where`¶