Python equivalence of Rs match() for indexing

2024/9/16 23:07:35

So i essentially want to implement the equivalent of R's match() function in Python, using Pandas dataframes - without using a for-loop.

In R match() returns a vector of the positions of (first) matches of its first argument in its second.

Let's say that I have two df A and B, of which both include the column C. Where

A$C = c('a','b')
B$C = c('c','c','b','b','c','b','a','a')

In R we would get

match(A$C,B$C) = c(7,3)

What is an equivalent method in Python for columns in pandas data frames, that doesn't require looping through the values.

Answer

Here is a one liner:

B.reset_index().groupby('C')['index'].first()[A.C].values

This solution returns the results in the same order as the input A, as match does in R.


Full example:

import pandas as pdA = pd.DataFrame({'C':['a','b']})
B = pd.DataFrame({'C':['c','c','b','b','c','b','a','a']})B.reset_index().groupby('C')['index'].first()[A.C].values
Output array([6, 2])

Edit (2023-04-12): In newer versions of pandas .loc matches all rows that match the condition. Thus, the previous solution (B.reset_index().set_index('c').loc[A.c, 'index'].values) would return all the matches instead of only the first ones.

https://en.xdnf.cn/q/72563.html

Related Q&A

Why doesnt Pydantic validate field assignments?

I want to use Pydantic to validate fields in my object, but it seems like validation only happens when I create an instance, but not when I later modify fields. from pydantic import BaseModel, validato…

Format OCR text annotation from Cloud Vision API in Python

I am using the Google Cloud Vision API for Python on a small program Im using. The function is working and I get the OCR results, but I need to format these before being able to work with them.This is …

Does pybtex support accent/special characters in .bib file?

from pybtex.database.input import bibtex parser = bibtex.Parser() bibdata = parser.parse_file("sample.bib")The above code snippet works really well in parsing a .bib file but it seems not to …

How do I count specific values across multiple columns in pandas

I have the DataFrame df = pd.DataFrame({colA:[?,2,3,4,?],colB:[1,2,?,3,4],colC:[?,2,3,4,5] })I would like to get the count the the number of ? in each column and return the following output - colA…

Split Python source into separate directories?

Here are some various Python packages my company "foo.com" uses:com.foo.bar.web com.foo.bar.lib com.foo.zig.web com.foo.zig.lib com.foo.zig.lib.lib1 com.foo.zig.lib.lib2Heres the traditional …

How can I use a raw_input with twisted?

I am aware that raw_input cannot be used in twisted. However here is my desired application.I have an piece of hardware that provides an interactive terminal serial port. I am trying to connect to th…

How to use Python and HTML to build a desktop software?

Maybe my question is stupid but I still want to ask. I am always wondering whether I can use Python, HTML and Css to develop a desktop software. I know there are alrealy several good GUI frameworks lik…

More efficient way to look up dictionary values whose keys start with same prefix

I have a dictionary whose keys come in sets that share the same prefix, like this:d = { "key1":"valA", "key123":"valB", "key1XY":"valC","…

When should I use dt.column vs dt[column] pandas?

I was doing some calculations and row manipulations and realised that for some tasks such as mathematical operations they both worked e.g.d[c3] = d.c1 / d. c2 d[c3] = d[c1] / d[c2]I was wondering wheth…

Quiver matplotlib : arrow with the same sizes

Im trying to do a plot with quiver but I would like the arrows to all have the same size.I use the following input :q = ax0.quiver(x, y, dx, dy, units=xy ,scale=1) But even if add options like norm = t…