how to check if a value exists in a dataframe

2024/9/20 21:33:34

hi I am trying to get the column name of a dataframe which contains a specific word,

eg: i have a dataframe,

NA              good    employee
Not available   best    employer
not required    well    manager
not eligible    super   reporteemy_word=["well"]

how to check if "well" exists in a df and the column name which has "well"

thanks in Advance!

Answer

Use DataFrame.isin for check all columns and DataFrame.any for check at least one True per row:

m = df.isin(my_word).any()
print (m)
0    False
1     True
2    False
dtype: bool

And then get columns names by filtering:

cols = m.index[m].tolist()
print(cols)
[1]

Data:

print (df)0      1         2
0            NaN   good  employee
1  Not available   best  employer
2   not required   well   manager
3   not eligible  super  reportee

Detail:

print (df.isin(my_word))0      1      2
0  False  False  False
1  False  False  False
2  False   True  False
3  False  False  Falseprint (df.isin(my_word).any())
0    False
1     True
2    False
dtype: bool

EDIT After converting get nested lists, so flattening is necessary:

my_word=["well","manager"]m = df.isin(my_word).any()
print (m)
0    False
1     True
2     True
dtype: boolnested = df.loc[:,m].values.tolist()
flat_list = [item for sublist in nested for item in sublist]
print (flat_list)
['good', 'employee', 'best', 'employer', 'well', 'manager', 'super', 'reportee']
https://en.xdnf.cn/q/72299.html

Related Q&A

Do something every time a module is imported

Is there a way to do something (like print "funkymodule imported" for example) every time a module is imported from any other module? Not only the first time its imported to the runtime or r…

Unit Testing Interfaces in Python

I am currently learning python in preperation for a class over the summer and have gotten started by implementing different types of heaps and priority based data structures.I began to write a unit tes…

Python Pandas average based on condition into new column

I have a pandas dataframe containing the following data:matchID server court speed 1 1 A 100 1 2 D 200 1 3 D 300 1 …

Merging same-indexed rows by taking non-NaNs from all of them in pandas dataframe

I have a sparse dataframe with duplicate indices. How can I merge the same-indexed rows in a way that I keep all the non-NaN data from the conflicting rows?I know that you can achieve something very c…

Approximating cos using the Taylor series

Im using the Taylors series to calculate the cos of a number, with small numbers the function returns accurate results for example cos(5) gives 0.28366218546322663. But with larger numbers it returns i…

How to apply max min boundaries to a value without using conditional statements

Problem:Write a Python function, clip(lo, x, hi) that returns lo if x is less than lo; hi if x is greater than hi; and x otherwise. For this problem, you can assume that lo < hi.Dont use any conditi…

pandas to_json() redundant backslashes

I have a .csv file containing data about movies and Im trying to reformat it as a JSON file to use it in MongoDB. So I loaded that csv file to a pandas DataFrame and then used to_json method to write i…

How can I get the old zip() in Python3?

I migrated from Python 2.7 to Python 3.3 and zip() does not work as expected anymore. Indeed, I read in the doc that it now returns an iterator instead of a list.So, how I am supposed to deal with this…

How can I use tensorflow metric function within keras models?

using python 3.5.2 tensorflow rc 1.1Im trying to use a tensorflow metric function in keras. the required function interface seems to be the same, but calling:import pandas import numpy import tensorflo…

Pandas return the next Sunday for every row

In Pandas for Python, I have a data set that has a column of datetimes in it. I need to create a new column that has the date of the following Sunday for each row. Ive tried various methods trying to u…