So i essentially want to implement the equivalent of R's match() function in Python, using Pandas dataframes - without using a for-loop.
In R match() returns a vector of the positions of (first) matches of its first argument in its second.
Let's say that I have two df A and B, of which both include the column C. Where
A$C = c('a','b')
B$C = c('c','c','b','b','c','b','a','a')
In R we would get
match(A$C,B$C) = c(7,3)
What is an equivalent method in Python for columns in pandas data frames, that doesn't require looping through the values.
Here is a one liner:
B.reset_index().groupby('C')['index'].first()[A.C].values
This solution returns the results in the same order as the input A
, as match
does in R.
Full example:
import pandas as pdA = pd.DataFrame({'C':['a','b']})
B = pd.DataFrame({'C':['c','c','b','b','c','b','a','a']})B.reset_index().groupby('C')['index'].first()[A.C].values
Output array([6, 2])
Edit (2023-04-12): In newer versions of pandas .loc
matches all rows that match the condition. Thus, the previous solution (B.reset_index().set_index('c').loc[A.c, 'index'].values
) would return all the matches instead of only the first ones.