I'm wondering what the difference is when you merge by pd.merge
versus dataframe.merge()
, examples below:
pd.merge(dataframe1, dataframe2)
and
dataframe1.merge(dataframe2)
I'm wondering what the difference is when you merge by pd.merge
versus dataframe.merge()
, examples below:
pd.merge(dataframe1, dataframe2)
and
dataframe1.merge(dataframe2)
We've two functions at our disposal for almost the same task pandas.merge() and DataFrame.merge().
pandas.merge(left, right, how='inner', on=None, left_on=None, right_on=None, left_index=False, right_index=False, sort=False, suffixes=('_x', '_y'), copy=True, indicator=False, validate=None)DataFrame.merge(right, how='inner', on=None, left_on=None, right_on=None, left_index=False, right_index=False, sort=False, suffixes='_x', '_y', copy=True, indicator=False, validate=None)
Both look similar, what's the advantage of using one over the other?
pd.merge() calls for df.merge, so df1.merge(df2) will give almost same results as pd.merge(df1, df2).
However, pd.merge() is wrapping style function and df1.merge() is chaining style, which makes the later easier to chain from left to right
E.g.,
df1.merge(df2).merge(df3) #looks better and readable [analogus to %>% pipeline operator in R] than pd.merge(pd.merge(df1, df2), df3).
d1 = pd.read_html('https://worldpopulationreview.com/countries')
pop = d1[0]
print(pop.info(), '\n') #Data for 232 countries for 7 columnspop.head(3)d2 = pd.read_html('https://worldpopulationreview.com/country-rankings/median-age')
age = d2[0]
print(age.info(), '\n') #Data for 221 countries for 5 columnsage.head(3)display('pd.merge(): ', pd.merge(pop, age), 'df.merge(): ', pop.merge(age))