Copy the following dataframe to your clipboard:
textId score textInfo
0 name1 1.0 text_stuff
1 name1 2.0 different_text_stuff
2 name1 2.0 text_stuff
3 name2 1.0 different_text_stuff
4 name2 1.3 different_text_stuff
5 name2 2.0 still_different_text
6 name2 1.0 yoko ono
7 name2 3.0 I lika da Gweneth
8 name3 1.0 Always a tradeoff
9 name3 3.0 What?!
Now use
import pandas as pd
df=pd.read_clipboard(sep='\s\s+')
to load it into your environment. How does one slice this dataframe such that all the rows of a particular textId
are returned if the score
group of that textId
includes at least one score
that equals 1.0, 2.0 and 3.0? Here, the desired operation's result would exclude textId
rows name1 since its score
group is missing a 3.0 and exclude name3 since its score
group is missing a 2.0:
textId score textInfo
0 name2 1.0 different_text_stuff
1 name2 1.3 different_text_stuff
2 name2 2.0 still_different_text
3 name2 1.0 yoko ono
4 name2 3.0 I lika da Gweneth
Attempts
df[df.textId == "textIdRowName" & df.score == 1.0 & df.score == 2.0 & & df.score == 3.0]
isn't right since the condition isn't acting on thetextId
group but only individual rows. If this could be rewritten to match againsttextId
groups then it could be placed in a for loop and fed the unique textIdRowName's. Such a function would collect the names of thetextId
in a series (saytextIdThatMatchScore123
) that could then be used to slice the original df likedf[df.textId.isin(textIdThatMatchScore123)]
.- Failing at
groupby
.