I have a Pandas DataFrame and I want to find all rows where the i'th column values are 10 times greater than other columns.
Here is an example of my DataFrame:
For example, looking at column i=0, row B (0.344) its is 10x greater than values in the same row but in other columns (0.001, 0, 0.009, 0). So I would like:
my_list_0=[False,True,False,False,False,False,False,False,False,False,False]
The number of columns might change hence I don't want a solution like:
#This is good only for a DataFrame with 4 columns.
my_list_i = data.loc[(data.iloc[:,i]>10*data.iloc[:,(i+1)%num_cols]) &(data.iloc[:,i]>10*data.iloc[:,(i+2)%num_cols]) &(data.iloc[:,i]>10*data.iloc[:,(i+3)%num_cols])]
Any idea?
thanks.
Given the df:
df = pd.DataFrame({'cell1':[0.006209, 0.344955, 0.004521, 0, 0.018931, 0.439725, 0.013195, 0.009045, 0, 0.02614, 0],'cell2':[0.048043, 0.001077, 0,0.010393, 0.031546, 0.287264, 0.016732, 0.030291, 0.016236, 0.310639,0], 'cell3':[0,0,0.020238, 0, 0.03811, 0.579348, 0.005906, 0,0,0.068352, 0.030165],'cell4':[0.016139, 0.009359, 0,0,0.025449, 0.47779, 0, 0.01282, 0.005107, 0.004846, 0],'cell5': [0,0,0,0.012075, 0.031668, 0.520258, 0,0,0,2.728218, 0.013418]})
i = 0
You can use
(10 * df.drop(df.columns[i], axis=1)).lt(df.iloc[:,i], axis=0).all(1)
To get
0 False
1 True
2 False
3 False
4 False
5 False
6 False
7 False
8 False
9 False
10 False
dtype: bool
for any number of columns. This drops column i
, multiplies the remaining df by 10, and checks row-wise for being less than i
, then returns True
only if all values in the row are True
. So it returns a vector of True
for each row where this obtains and False
for others.
If you want to give an arbitrary threshold, you can sum the Trues
and divide by the number of columns - 1, then compare with your threshold:
thresh = 0.5 # or whatever you want
(10 * df.drop(df.columns[i], axis=1)).lt(df.iloc[:,i], axis=0).sum(1) / (df.shape[1] - 1) > thresh0 False
1 True
2 True
3 False
4 False
5 False
6 False
7 False
8 False
9 False
10 False
dtype: bool