Filtering Pandas DataFrame using a condition on column values that are numpy arrays

2024/10/11 13:22:07

I have a Pandas DataFrame called 'dt', which has two columns called 'A' and 'B'. The values of column 'B' are numpy arrays; Something like this:

index   A   B
0       a   [1,2,3]
1       b   [2,3,4]
2       c   [3,4,5]

Where:

type (dt["B"][0])

returns: numpy.ndarray

I want to filter this DataFrame to get another DataFrame, where only rows that have a certain element in the numpy array stored in 'B' are present.

I've tried this:

dt [element in dt["B"]]

So for example:

dt [2 in dt["B"]]

should return:

index   A   B
0       a   [1,2,3]
1       b   [2,3,4]

But this results in an error, namely "KeyError: True"

If the values of column "B" were strings, I could done the same with no error:

dt [dt["B"]==value]

So I wonder why my code doesn't work, and what does "KeyError: True" mean.

The complete error is this:

KeyError                                  Traceback (most recent call last)
~/Applications/Conda/lib/python3.7/site-packages/pandas/core/indexes/base.py in get_loc(self, key, method, tolerance)2645             try:
-> 2646                 return self._engine.get_loc(key)2647             except KeyError:pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()KeyError: TrueDuring handling of the above exception, another exception occurred:KeyError                                  Traceback (most recent call last)
<ipython-input-151-aa9ea046a48f> in <module>
----> 1 quotes_of_base["BTC" in quotes_of_base["quote"]]~/Applications/Conda/lib/python3.7/site-packages/pandas/core/frame.py in __getitem__(self, key)2798             if self.columns.nlevels > 1:2799                 return self._getitem_multilevel(key)
-> 2800             indexer = self.columns.get_loc(key)2801             if is_integer(indexer):2802                 indexer = [indexer]~/Applications/Conda/lib/python3.7/site-packages/pandas/core/indexes/base.py in get_loc(self, key, method, tolerance)2646                 return self._engine.get_loc(key)2647             except KeyError:
-> 2648                 return self._engine.get_loc(self._maybe_cast_indexer(key))2649         indexer = self.get_indexer([key], method=method, tolerance=tolerance)2650         if indexer.ndim > 1 or indexer.size > 1:pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()KeyError: True
Answer
  • lets say you have something like:

          A         B0  10   [11, 0]1  20  [11, 10]2  30  [11, 10]3  40   [10, 0]4  50   [11, 0]5  60   [10, 0]  
    
  • And would like to filter only those in the array containing 10

          A         B1  20  [11, 10]2  30  [11, 10]3  40   [10, 0]5  60   [10, 0]
    
  • You can use .apply

      #create the dataframedf = pd.DataFrame(columns = ['A','B'])df.A = [10,20,30,40,50,60]df.B = [[11,0],[11,10],[11,10],[10,0],[11,0],[10,0]]# results is a boolean indicating whether the value is found in the list# apply the filter in the column 'B' of the dataframeresults = df.B.apply(lambda a: 10 in a)# filter the dataframe based on the booleandf_filtered = df[results]print(df_filtered)
    
  • Then you get:

                A   B1         20  [11, 10]2         30  [11, 10]3         40   [10, 0]5         60   [10, 0]
    

you can find more details at: https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.apply.html

https://en.xdnf.cn/q/118321.html

Related Q&A

Creation a tridiagonal block matrix in python [duplicate]

This question already has answers here:Block tridiagonal matrix python(9 answers)Closed 6 years ago.How can I create this matrix using python ? Ive already created S , T , X ,W ,Y and Z as well as the…

Python tkinter checkbutton value always equal to 0

I put the checkbutton on the text widget, but everytime I select a checkbutton, the function checkbutton_value is called, and it returns 0.Part of the code is :def callback():file_name=askopenfilename(…

How does derived class arguments work in Python?

I am having difficulty understanding one thing in Python.I have been coding in Python from a very long time but theres is something that just struck me today which i struggle to understandSo the situat…

grouping on tems in a list in python

I have 60 records with a column "skillsList" "("skillsList" is a list of skills) and "IdNo". I want to find out how many "IdNos" have a skill in common.How …

How do I show a suffix to user input in Python?

I want a percentage sign to display after the users enters their number. Thankspercent_tip = float(input(" Please Enter the percent of the tip:")("%"))For example, before the user t…

Discord.py Self Bot using rewrite

Im trying to make a selfbot using discord.py rewrite. Im encountering issues when attempting to create a simple command. Id like my selfbot to respond with "oof" when ">>>test&q…

int to binary python

This question is probably very easy for most of you, but i cant find the answer so far.Im building a network packet generator that goes like this:class PacketHeader(Packet): fields = OrderedDict([(&quo…

Get aiohttp results as string

Im trying to get data from a website using async in python. As an example I used this code (under A Better Coroutine Example): https://www.blog.pythonlibrary.org/2016/07/26/python-3-an-intro-to-asyncio…

Waiting for a timer to terminate before continuing running the code

The following code updates the text of a button every second after the START button was pressed. The intended functionality is for the code to wait until the timer has stopped before continuing on with…

PySpark: how to resolve path of a resource file present inside the dependency zip file

I have a mapPartitions on an RDD and within each partition, a resource file has to be opened. This module that contains the method invoked by mapPartitions and the resource file is passed on to each ex…