Ben Chuanlong Du's Blog

It is never too late to learn.

Hands on pandas.Series in Python

pandas.Series.str

  1. The attribute pandas.Series.str can only be used with Series of str values. You will either encounter an AttributionError (Can only use .str accessor with string values, which use np.object_ dtype in pandas) or find it to yield a Series of NaN's if you invoke it on a Series of non-string values. If you have control of the DataFrame, the preferred way is to cast the type the column to str in the DataFrame.

     df.status = df.status.astype(str)
    
    

    Generally speaking, it is a good idea to make sure that a column always have the same type in a pandas DataFrame. If you do not want to cast the column to str in the DataFrame (for any reason), you can do this in computation without changing the type of the original column.

     df = df[df.status.astype(str).str.contains('Exit')]
  2. pandas.series.str.replace supports regular expression.

In [4]:
import numpy as np
import pandas as pd
In [3]:
x = pd.Series([1, 2, 3])
x
Out[3]:
0    1
1    2
2    3
dtype: int64

Accessing .str with a Series of non-string values might throw AttributeError.

In [5]:
x.str
---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-5-eb971929b925> in <module>
----> 1 x.str

~/.local/lib/python3.7/site-packages/pandas/core/generic.py in __getattr__(self, name)
   5061         if (name in self._internal_names_set or name in self._metadata or
   5062                 name in self._accessors):
-> 5063             return object.__getattribute__(self, name)
   5064         else:
   5065             if self._info_axis._can_hold_identifiers_and_holds_name(name):

~/.local/lib/python3.7/site-packages/pandas/core/accessor.py in __get__(self, obj, cls)
    169             # we're accessing the attribute of the class, i.e., Dataset.geo
    170             return self._accessor
--> 171         accessor_obj = self._accessor(obj)
    172         # Replace the property with the accessor object. Inspired by:
    173         # http://www.pydanny.com/cached-property.html

~/.local/lib/python3.7/site-packages/pandas/core/strings.py in __init__(self, data)
   1794 
   1795     def __init__(self, data):
-> 1796         self._validate(data)
   1797         self._is_categorical = is_categorical_dtype(data)
   1798 

~/.local/lib/python3.7/site-packages/pandas/core/strings.py in _validate(data)
   1816             # (instead of test for object dtype), but that isn't practical for
   1817             # performance reasons until we have a str dtype (GH 9343)
-> 1818             raise AttributeError("Can only use .str accessor with string "
   1819                                  "values, which use np.object_ dtype in "
   1820                                  "pandas")

AttributeError: Can only use .str accessor with string values, which use np.object_ dtype in pandas

Try to invoke methods in pandas.Series.str on a Series of pathlib.Path yields a series of NaN's.

In [7]:
paths = pd.Series([Path("/root"), Path("abc.txt")])
paths
Out[7]:
0      /root
1    abc.txt
dtype: object
In [10]:
paths.str.upper()
Out[10]:
0   NaN
1   NaN
dtype: float64

A simple solution is to convert the type of the Series to str first and then call methods in pandas.Series.str.

In [11]:
paths.astype(str).str.upper()
Out[11]:
0      /ROOT
1    ABC.TXT
dtype: object
In [5]:
s = pd.Series([np.nan, 1, 3, 10, 5])
s
Out[5]:
0     NaN
1     1.0
2     3.0
3    10.0
4     5.0
dtype: float64
In [6]:
s.sort_values()
Out[6]:
1     1.0
2     3.0
4     5.0
3    10.0
0     NaN
dtype: float64
In [ ]:
 

Comments