How to select dataframe rows according to multi-(other column)-condition on columnar groups?

2024/10/5 1:25:17

Copy the following dataframe to your clipboard:

  textId   score              textInfo
0  name1     1.0            text_stuff
1  name1     2.0  different_text_stuff
2  name1     2.0            text_stuff
3  name2     1.0  different_text_stuff
4  name2     1.3  different_text_stuff
5  name2     2.0  still_different_text
6  name2     1.0              yoko ono
7  name2     3.0     I lika da Gweneth
8  name3     1.0     Always a tradeoff
9  name3     3.0                What?!

Now use

import pandas as pd
df=pd.read_clipboard(sep='\s\s+')

to load it into your environment. How does one slice this dataframe such that all the rows of a particular textId are returned if the score group of that textId includes at least one score that equals 1.0, 2.0 and 3.0? Here, the desired operation's result would exclude textId rows name1 since its score group is missing a 3.0 and exclude name3 since its score group is missing a 2.0:

  textId   score              textInfo
0  name2     1.0  different_text_stuff
1  name2     1.3  different_text_stuff
2  name2     2.0  still_different_text
3  name2     1.0              yoko ono
4  name2     3.0     I lika da Gweneth

Attempts

  1. df[df.textId == "textIdRowName" & df.score == 1.0 & df.score == 2.0 & & df.score == 3.0] isn't right since the condition isn't acting on the textId group but only individual rows. If this could be rewritten to match against textId groups then it could be placed in a for loop and fed the unique textIdRowName's. Such a function would collect the names of the textId in a series (say textIdThatMatchScore123) that could then be used to slice the original df like df[df.textId.isin(textIdThatMatchScore123)].
  2. Failing at groupby.
Answer

Here's one solution - groupby textId, then keep only those groups where the unique values of score is a superset (>=) of [1.0, 2.0, 3.0].

In [58]: df.groupby('textId').filter(lambda x: set(x['score']) >= set([1.,2.,3.]))
Out[58]: textId  score              textInfo
3  name2    1.0  different_text_stuff
4  name2    1.3  different_text_stuff
5  name2    2.0  still_different_text
6  name2    1.0              yoko ono
7  name2    3.0     I lika da Gweneth
https://en.xdnf.cn/q/70545.html

Related Q&A

Python Recursive Search of Dict with Nested Keys

I recently had to solve a problem in a real data system with a nested dict/list combination. I worked on this for quite a while and came up with a solution, but I am very unsatisfied. I had to resort t…

Scrapy: how to catch download error and try download it again

During my crawling, some pages failed due to unexpected redirection and no response returned. How can I catch this kind of error and re-schedule a request with original url, not with the redirected url…

Cryptacular is broken

this weekend our docker image broke because it cannot be build anymore. While looking into the stats, I saw this line:crypt_blowfish-1.2/crypt.h:17:23: fatal error: gnu-crypt.h: No such file or directo…

how to run test against the built image before pushing to containers registry?

From the gitlab documentation this is how to create a docker image using kaniko: build:stage: buildimage:name: gcr.io/kaniko-project/executor:debugentrypoint: [""]script:- mkdir -p /kaniko/.d…

Adding a colorbar to a pcolormesh with polar projection

I am trying to add a colorbar to a pcolormesh plot with polar projection. The code works fine if I dont specify a polar projection. With polar projection specified, a tiny plot results, and the colorba…

GridSearch for Multi-label classification in Scikit-learn

I am trying to do GridSearch for best hyper-parameters in every individual one of ten folds cross validation, it worked fine with my previous multi-class classification work, but not the case this time…

Visualize tree in bash, like the output of unix tree

Given input:apple: banana eggplant banana: cantaloupe durian eggplant: fig:I would like to concatenate it into the format:├─ apple │ ├─ banana │ │ ├─ cantaloupe │ │ └─ durian │ └…

pygame.error: Failed loading libmpg123.dll: Attempt to access invalid address

music = pygame.mixer.music.load(not.mp3) pygame.mixer.music.play(loops=-1)when executing the code I got this error: Traceback (most recent call last):File "C:\Users\Admin\AppData\Local\Programs\Py…

Plot Red Channel from 3D Numpy Array

Suppose that we have an RGB image that we have converted it to a Numpy array with the following code:import numpy as np from PIL import Imageimg = Image.open(Peppers.tif) arr = np.array(img) # 256x256x…

How to remove image noise using opencv - python?

I am working with skin images, in recognition of skin blemishes, and due to the presence of noises, mainly by the presence of hairs, this work becomes more complicated.I have an image example in which …