Check if one series is subset of another in Pandas

2024/7/4 17:08:37

I have 2 columns from 2 different dataframes. I want to check if column 1 is a subset of column 2.

I was using the following code:

set(col1).issubset(set(col2))

The issue with this is that if col1 has only integers and col2 has both integers and strings, then this returns false. This happens because elements of col2 are coerced into strings. For example,

set([376, 264, 365, 302]) & 
set(['302', 'water', 'nist1950', '264', '365', '376'])

I tried using isin from pandas. But if col1 and col2 are series then this gives a series of Boolean values. I want True or False.

How do I solve this? Is there a simpler function that I have missed?

Edit 1

Adding an example.

col1
0    365
1    376
2    302
3    264
Name: subject, dtype: int64col2
0     nist1950
1     nist1950
2        water
3        water
4          376
5          376
6          302
7          302
8          365
9          365
10         264
11         264
12         376
13         376
Name: subject, dtype: object

Edit 2

col1 and col2 can have integers, strings, floats etc. I would like to not make any prejudgement about what is in these columns.

Answer

You could use isin with all to check whether all of your col1 elements contains in col2. For converting to numeric you could use pd.to_numeric:

s1 = pd.Series([376, 264, 365, 302])
s2 = pd.Series(['302', 'water', 'nist1950', '264', '365', '376'])res = s1.isin(pd.to_numeric(s2, errors='coerce')).all()In [213]: res
Out[213]: True

More detailed:

In [214]: pd.to_numeric(s2, errors='coerce')
Out[214]:
0    302
1    NaN
2    NaN
3    264
4    365
5    376
dtype: float64In [215]: s1.isin(pd.to_numeric(s2, errors='coerce'))
Out[215]:
0    True
1    True
2    True
3    True
dtype: bool

Note pd.to_numeric works with pandas version >=0.17.0 for previous you cound use convert_objects with convert_numeric=True

EDIT

If you prefer solution with set you could convert your first set to str as well and then compare them with your code:

s3 = set(map(str, s1))In [234]: s3
Out[234]: {'264', '302', '365', '376'}

Then you could use issubset for s2:

In [235]: s3.issubset(s2)
Out[235]: True

or for set(s2):

In [236]: s3.issubset(set(s2))
Out[236]: True

EDIT2

s1 = pd.Series(['376', '264', '365', '302'])
s4 = pd.Series(['nist1950', 'nist1950', 'water', 'water', '376', '376', '302', '302', '365', '365', '264', '264', '376', '376'])In [263]: s1.astype(float).isin(pd.to_numeric(s4, errors='coerce')).all()
Out[263]: True
https://en.xdnf.cn/q/73242.html

Related Q&A

How to change the head size of the double head annotate in matplotlib?

Below figure shows the plot of which arrow head is very small...I tried below code, but it didnot work... it said " raise AttributeError(Unknown property %s % k) AttributeError: Unknown propert…

Passing a firefox profile to remote webdriver firefox instance not working

Im trying to start up a remote webdriver instance of Firefox and pass in a profile.profile = webdriver.FirefoxProfile() profile.set_preference("browser.download.folderList","2") sel…

Tensorflow 2.3.0 does not detect GPU

The tensorflow does not detect the GPU card. I have following the procedures suggest at Nvidia website and tensorflow/install/gpu. How can I fix it? I am using the following packages and drives: NVIDI…

How to create dict from class without None fields?

I have the following dataclass:@dataclass class Image:content_type: strdata: bytes = bid: str = ""upload_date: datetime = Nonesize: int = 0def to_dict(self) -> Dict[str, Any]:result = {}if…

Is sys.exit equivalent to raise SystemExit?

According to the documentation on sys.exit and SystemExit, it seems thatdef sys.exit(return_value=None): # or return_value=0raise SystemExit(return_value)is that correct or does sys.exit do something …

Vertical overflow of table in live display should scroll the content

Im using a Live display to show the content of a Table which grows over time. Eventually there is a vertical overflow and in that case Id like the oldest (i.e. topmost) rows to vanish while the most re…

Reading KML Files Using Fastkml

Ive searched around quite a bit for this specific Python module, and havent been able to find a source which points me in the right direction.Im trying to read a KML file and display all of the feature…

Adding extra fields to django-registration form

I have a model called "Organization" that Ive setup as a User profile and I would like to have the fields from the "Organization" model show up on the registration page. How do I go…

Need to do a daily log rotation (0utc) using Python

Im an admitted noob to Python. Ive written a little logger that takes data from the serial port and writes it to a log file. Ive got a small procedure that opens the file for append, writes, then close…

Save/Load a Dictionary

Ive found a couple of others asking for help with this, but not specifically what Im trying to do. I have a dictionary full of various formats (int, str, bool, etc) and Im trying to save it so I can lo…