Comments¶

There are multiple ways to update the index of a DataFrame or Series. First, you can assign a new Series or Index object to the index of a DataFrame or Series. Or you can use methods such as DataFrame.set_index or DataFrame.reset_index. DataFrame.reset_index resets the index of a DataFrame/Series to an integer index starting from 0. The old index is kept by default but can be dropped using the option drop=True. DataFrame.set_index sets the index of a DataFrame to the specified column and removes the column from the DataFrame. This can also be achieved by directly assign the column to the index of the DataFrame and then manually remove the column from the DataFrame. Note that by default DataFrame.set_index, DataFrame.reset_index and Series.reset_index returns new copies. The option inplace=True can be specified to make the update in-place.

set_index ¶

In [1]:

import pandas as pd

df = pd.DataFrame(
    {"x": [1, 2, 3, 4, 5], "y": [5, 4, 3, 2, 1]}, index=["r1", "r2", "r3", "r4", "r5"]
)

df

Out[1]:

	x	y
r1	1	5
r2	2	4
r3	3	3
r4	4	2
r5	5	1

In [2]:

df.set_index("x")

Out[2]:

	y
x
1	5
2	4
3	3
4	2
5	1

reindex¶

DataFrame.reindex does NOT change the original index. It just rearrange rows according to the specified index. If you want change the index but keep the orignal order of row, just assign new values to the index of the DataFrame or call the method reset_index(drop=True).

In [1]:

import pandas as pd

df = pd.DataFrame(
    {"x": [1, 2, 3, 4, 5], "y": [5, 4, 3, 2, 1]}, index=["r1", "r2", "r3", "r4", "r5"]
)

df.head()

Out[1]:

	x	y
r1	1	5
r2	2	4
r3	3	3
r4	4	2
r5	5	1

In [11]:

df.reindex(index=range(0, df.shape[0]))

Out[11]:

	x	y
0	NaN	NaN
1	NaN	NaN
2	NaN	NaN
3	NaN	NaN
4	NaN	NaN

In [12]:

df.reindex(index=["r1", "r3", "r5", "r2", "r4"])

Out[12]:

	x	y
r1	1	5
r3	3	3
r5	5	1
r2	2	4
r4	4	2

In [4]:

x = df.copy()
print(x)
x.index = range(1, 6)
x

Out[4]:

	x	y
1	1	5
2	2	4
3	3	3
4	4	2
5	5	1

In [22]:

x = df.copy()
x.reset_index()

Out[22]:

	index	x	y
0	r1	1	5
1	r2	2	4
2	r3	3	3
3	r4	4	2
4	r5	5	1

In [3]:

x = df.copy()
x.reset_index(drop=True, inplace=True)
x

Out[3]:

	x	y
0	1	5
1	2	4
2	3	3
3	4	2
4	5	1

reset_index ¶

By default reset_index returns a copy rather than modify the original data frame. You can specify inplace=True to overwrite the behavior.

Series¶

If you drop the original index, you still have a Series. However, if you reset index of a sereis without dropping the original index, you get a data frame.

In [5]:

s = pd.Series([1, 2, 3, 4], index=["r1", "r2", "r3", "r4"])
s

Out[5]:

r1    1
r2    2
r3    3
r4    4
dtype: int64

In [8]:

df = s.reset_index()
df

Out[8]:

	index	0
0	r1	1
1	r2	2
2	r3	3
3	r4	4

In [10]:

df = s.reset_index(drop=True)
df

Out[10]:

0    1
1    2
2    3
3    4
dtype: int64

DataFrame¶

In [15]:

import pandas as pd

df = pd.DataFrame(
    {"x": [1, 2, 3, 4, 5], "y": [5, 4, 3, 2, 1]}, index=["r1", "r2", "r3", "r4", "r5"]
)

df.head()

Out[15]:

	x	y
r1	1	5
r2	2	4
r3	3	3
r4	4	2
r5	5	1

In [29]:

# keep the original index as a new column and create a new index
df.reset_index()

Out[29]:

	index	x	y
0	r1	1	5
1	r2	2	4
2	r3	3	3
3	r4	4	2
4	r5	5	1

In [30]:

# drop the original index and create a new index
df.reset_index(drop=True)

Out[30]:

	x	y
0	1	5
1	2	4
2	3	3
3	4	2
4	5	1

Multi-index¶

In [31]:

import pandas as pd

df = pd.DataFrame(
    {"x": [1, 2, 3, 4, 5], "y": [5, 4, 3, 2, 1]},
    index=pd.MultiIndex.from_tuples(
        [("r1", 0), ("r2", 1), ("r3", 2), ("r4", 3), ("r5", 4)]
    ),
)

df.head()

Out[31]:

		x	y
r1	0	1	5
r2	1	2	4
r3	2	3	3
r4	3	4	2
r5	4	5	1

In [32]:

df.reset_index()

Out[32]:

	level_0	level_1	x	y
0	r1	0	1	5
1	r2	1	2	4
2	r3	2	3	3
3	r4	3	4	2
4	r5	4	5	1

In [33]:

df.reset_index(drop=True)

Out[33]:

	x	y
0	1	5
1	2	4
2	3	3
3	4	2
4	5	1

In [38]:

# drops the 2nd index and keep the first index
df.reset_index(level=1, drop=True)

Out[38]:

	x	y
r1	1	5
r2	2	4
r3	3	3
r4	4	2
r5	5	1

Assign Index¶

In [1]:

import pandas as pd

df = pd.DataFrame(
    {"x": [1, 2, 3, 4, 5], "y": [5, 4, 3, 2, 1]}, index=["r1", "r2", "r3", "r4", "r5"]
)

df.head()

Out[1]:

	x	y
r1	1	5
r2	2	4
r3	3	3
r4	4	2
r5	5	1

In [2]:

df.index = df.y
df

Out[2]:

	x	y
y
5	1	5
4	2	4
3	3	3
2	4	2
1	5	1

Index to Series¶

An index can be converted to a Series object, which makes it benefits from the rich methods of Series.

In [49]:

df.columns.to_series().select(lambda x: x == "x")

Out[49]:

x    x
dtype: object

Multi-Index¶

In [ ]:

pd.MultiIndex.from_product([[jj.index.name], jj.index.values])

In [ ]:

index = pd.MultiIndex.from_tuples(tuples, names=["first", "second"])

In [ ]:

pd.MultiIndex.from_tuples([(jj.index.name, v) for v in jj.index.values])

References¶

https://www.youtube.com/watch?v=tcRGa2soc-c

https://stackoverflow.com/questions/38542419/could-pandas-use-column-as-index

https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.set_index.html

http://www.legendu.net/misc/blog/python-pandas-set_index/

http://www.legendu.net/misc/blog/python-pandas-reset_index/

http://www.legendu.net/misc/blog/python-pandas-reindex/

http://www.legendu.net/misc/blog/python-pandas-multiindex/

In [ ]:

Ben Chuanlong Du's Blog

It is never too late to learn.

Understand Index in pandas

Comments¶

set_index ¶

reindex¶

reset_index ¶

Series¶

DataFrame¶

Multi-index¶

Assign Index¶

Index to Series¶

Multi-Index¶

References¶

Comments

Comments¶

set_index¶

reindex¶

reset_index¶

Series¶

DataFrame¶

Multi-index¶

Assign Index¶

Index to Series¶

Multi-Index¶

References¶

Comments

set_index ¶

reset_index ¶