In [1]:
import pandas as pd
import numpy as np
import datetime
Summary¶
There are many ways to filter rows of a pandas DataFrame. The slicing way is the universal way, however, the method
DataFrame.query
can be more convenient for simple filtering conditions and it is more efficient too.DataFrame.where
is not intened for filtering but rather a analog tocase
statement in SQL andif/else
or ternary expressions.
In [2]:
df = pd.DataFrame(
{
"id": [1, 2, 3, 4, 5],
"word": ["Mary", "is", "a", "fox", "."],
"date": [datetime.date.today() + datetime.timedelta(days=i) for i in range(5)],
}
)
df
Out[2]:
DataFrame.query
¶
In [3]:
df.query("id % 2 == 0")
Out[3]:
In [4]:
df.query("id in [3, 5, 7]")
Out[4]:
In [29]:
x = [3, 5, 7]
df.query("id in @x")
Out[29]:
In [30]:
df.query("word.str.len() == id")
Out[30]:
In [31]:
df.query("word.str.lower() == 'mary'")
Out[31]:
In [10]:
df.query("word.str.startswith('f')")
Out[10]:
In [12]:
df[df.word.str.startswith("f")]
Out[12]:
The below example is convenient way to compare a datetime column with a literal string.
In [32]:
df.query("date.astype('str') >= '2020-06-08'")
Out[32]:
Use Slicing¶
In [7]:
df[df.x % 2 == 0]
Out[7]:
DataFrame.where
¶
The method DataFrame.where
is not intended for filtring.
It is an analog to case
statement in SQL and if/else
and ternary expression in programming languages.
The condition that DataFrame.where
takes can be a bool Series/DataFrame, array-like, or callable.
In [14]:
df.where(lambda r: r.id % 2 == 0)
Out[14]:
In [18]:
df.where(lambda r: r.id % 2 == 0, df * 3)
Out[18]:
In [9]:
df.where(lambda r: r.x % 2 == 0).dropna()
Out[9]:
In [19]:
df.id.where(lambda x: x % 2 == 0)
Out[19]:
In [20]:
df.id.where(lambda x: x % 2 == 0, 0)
Out[20]:
In [ ]: