Comments¶
- There are multiple ways to update the index of a DataFrame or Series.
First,
you can assign a new
Series
orIndex
object to the index of a DataFrame or Series. Or you can use methods such asDataFrame.set_index
orDataFrame.reset_index
.DataFrame.reset_index
resets the index of a DataFrame/Series to an integer index starting from 0. The old index is kept by default but can be dropped using the optiondrop=True
.DataFrame.set_index
sets the index of a DataFrame to the specified column and removes the column from the DataFrame. This can also be achieved by directly assign the column to the index of the DataFrame and then manually remove the column from the DataFrame. Note that by defaultDataFrame.set_index
,DataFrame.reset_index
andSeries.reset_index
returns new copies. The optioninplace=True
can be specified to make the update in-place.
import pandas as pd
df = pd.DataFrame(
{"x": [1, 2, 3, 4, 5], "y": [5, 4, 3, 2, 1]}, index=["r1", "r2", "r3", "r4", "r5"]
)
df
df.set_index("x")
reindex¶
DataFrame.reindex does NOT change the original index.
It just rearrange rows according to the specified index.
If you want change the index but keep the orignal order of row,
just assign new values to the index of the DataFrame
or call the method reset_index(drop=True)
.
import pandas as pd
df = pd.DataFrame(
{"x": [1, 2, 3, 4, 5], "y": [5, 4, 3, 2, 1]}, index=["r1", "r2", "r3", "r4", "r5"]
)
df.head()
df.reindex(index=range(0, df.shape[0]))
df.reindex(index=["r1", "r3", "r5", "r2", "r4"])
x = df.copy()
print(x)
x.index = range(1, 6)
x
x = df.copy()
x.reset_index()
x = df.copy()
x.reset_index(drop=True, inplace=True)
x
reset_index¶
By default reset_index
returns a copy rather than modify the original data frame.
You can specify inplace=True
to overwrite the behavior.
Series¶
- If you drop the original index, you still have a Series. However, if you reset index of a sereis without dropping the original index, you get a data frame.
s = pd.Series([1, 2, 3, 4], index=["r1", "r2", "r3", "r4"])
s
df = s.reset_index()
df
df = s.reset_index(drop=True)
df
DataFrame¶
import pandas as pd
df = pd.DataFrame(
{"x": [1, 2, 3, 4, 5], "y": [5, 4, 3, 2, 1]}, index=["r1", "r2", "r3", "r4", "r5"]
)
df.head()
# keep the original index as a new column and create a new index
df.reset_index()
# drop the original index and create a new index
df.reset_index(drop=True)
Multi-index¶
import pandas as pd
df = pd.DataFrame(
{"x": [1, 2, 3, 4, 5], "y": [5, 4, 3, 2, 1]},
index=pd.MultiIndex.from_tuples(
[("r1", 0), ("r2", 1), ("r3", 2), ("r4", 3), ("r5", 4)]
),
)
df.head()
df.reset_index()
df.reset_index(drop=True)
# drops the 2nd index and keep the first index
df.reset_index(level=1, drop=True)
Assign Index¶
import pandas as pd
df = pd.DataFrame(
{"x": [1, 2, 3, 4, 5], "y": [5, 4, 3, 2, 1]}, index=["r1", "r2", "r3", "r4", "r5"]
)
df.head()
df.index = df.y
df
Index to Series¶
An index can be converted to a Series object, which makes it benefits from the rich methods of Series.
df.columns.to_series().select(lambda x: x == "x")
Multi-Index¶
pd.MultiIndex.from_product([[jj.index.name], jj.index.values])
index = pd.MultiIndex.from_tuples(tuples, names=["first", "second"])
pd.MultiIndex.from_tuples([(jj.index.name, v) for v in jj.index.values])
References¶
https://www.youtube.com/watch?v=tcRGa2soc-c
https://stackoverflow.com/questions/38542419/could-pandas-use-column-as-index
https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.set_index.html
http://www.legendu.net/misc/blog/python-pandas-set_index/
http://www.legendu.net/misc/blog/python-pandas-reset_index/