Reference¶
https://pandas.pydata.org/pandas-docs/stable/merging.html
http://stackoverflow.com/questions/22676081/pandas-the-difference-between-join-and-merge
Comment¶
You are able to specify (via
left_on
andright_on
) which columns to join in each data frame.Columns that appear in both data frames but not used in joining are distinguished using suffixes.
In [19]:
import pandas as pd
df1 = pd.DataFrame({"x": [1, 2, 3], "y": [5, 4, 3]})
print(df1)
df2 = pd.DataFrame({"x": [10, 20, 30], "z": ["a", "b", "c"]})
print(df2)
Default Join¶
Columns (x in this case) appear in both data frames are used for joining.
In [14]:
df1.merge(df2)
Out[14]:
Join on Index¶
In [16]:
df1.merge(df2, left_index=True, right_index=True)
Out[16]:
Join on Specified Columns¶
In [21]:
import pandas as pd
df1 = pd.DataFrame({"id": [1, 2, 3], "v": [5, 4, 3]})
print(df1)
df2 = pd.DataFrame({"x": [1, 2, 3], "y": ["a", "b", "c"]})
print(df2)
In [23]:
df1.merge(df2, left_on="id", right_on="x")
Out[23]:
Cartesion/Cross Join¶
In [3]:
import pandas as pd
df1 = pd.DataFrame({"id": [1, 2], "v": [5, 4]})
df1
Out[3]:
In [4]:
df2 = pd.DataFrame({"x": [10, 20], "y": ["a", "b"]})
df2
Out[4]:
In [5]:
df1.assign(key=1).merge(df2.assign(key=1))
Out[5]:
In [7]:
df1.assign(key=1).merge(df2.assign(key=1)).drop("key", axis=1)
Out[7]:
In [ ]: