boolean mask in pandas panel

2024/10/6 10:28:50

i am having some trouble masking a panel in the same way that I would a DataFrame. What I want to do feels simple, but I have not found a way looking at the docs and online forums. I have a simple example below:

import pandas
import numpy as np
import datetime
start_date = datetime.datetime(2009,3,1,6,29,59)
r = pandas.date_range(start_date, periods=12)
cols_1 = ['AAPL', 'AAPL', 'GOOG', 'GOOG', 'GS', 'GS']
cols_2 = ['close', 'rate', 'close', 'rate', 'close', 'rate']
dat = np.random.randn(12, 6)dftst = pandas.DataFrame(dat, columns=pandas.MultiIndex.from_arrays([cols_1, cols_2], names=['ticker','field']), index=r)
pn = dftst.T.to_panel().transpose(2,0,1)
print pnOut[14]: 
<class 'pandas.core.panel.Panel'>
Dimensions: 2 (items) x 12 (major_axis) x 3 (minor_axis)
Items axis: close to rate
Major_axis axis: 2009-03-01 06:29:59 to 2009-03-12 06:29:59
Minor_axis axis: AAPL to GS

I now have a Panel object, if I take a slice along the items axis, I get a DataFrame

close_p = pn['close']
print close_pOut[16]: 
ticker                   AAPL      GOOG        GS
2009-03-01 06:29:59 -0.082203 -0.286354  1.227193
2009-03-02 06:29:59  0.340005 -0.688933 -1.505137
2009-03-03 06:29:59 -0.525567  0.321858 -0.035047
2009-03-04 06:29:59 -0.123549 -0.841781 -0.616523
2009-03-05 06:29:59 -0.407504  0.188372  1.311262
2009-03-06 06:29:59  0.272883  0.817179  0.584664
2009-03-07 06:29:59 -1.767227  1.168876  0.443096
2009-03-08 06:29:59 -0.685501 -0.534373 -0.063906
2009-03-09 06:29:59  0.851820  0.068740  0.566537
2009-03-10 06:29:59  0.390678 -0.012422 -0.152375
2009-03-11 06:29:59 -0.985585 -0.917705 -0.585091
2009-03-12 06:29:59  0.067498 -0.764343  0.497270

I can filter this data in two ways:

1) I create a mask and mask the data as follows:

msk = close_p > 0
close_p = close_p.mask(msk)

2) I can just slice by the boolean operator in msk above

close_p = close_p[close_p > 0]
Out[28]: 
ticker                   AAPL      GOOG        GS
2009-03-01 06:29:59       NaN       NaN  1.227193
2009-03-02 06:29:59  0.340005       NaN       NaN
2009-03-03 06:29:59       NaN  0.321858       NaN
2009-03-04 06:29:59       NaN       NaN       NaN
2009-03-05 06:29:59       NaN  0.188372  1.311262
2009-03-06 06:29:59  0.272883  0.817179  0.584664
2009-03-07 06:29:59       NaN  1.168876  0.443096
2009-03-08 06:29:59       NaN       NaN       NaN
2009-03-09 06:29:59  0.851820  0.068740  0.566537
2009-03-10 06:29:59  0.390678       NaN       NaN
2009-03-11 06:29:59       NaN       NaN       NaN
2009-03-12 06:29:59  0.067498       NaN  0.497270

What I cannot figure out how to do is filter all of my data based on a mask without a for loop. I can do the following:

msk = (pn['rate'] > 0) & (pn['close'] > 0)
def mask_panel(pan, msk):for item in pan.items:pan[item] = pan[item].mask(msk)return pan
print pn['close']Out[32]: 
ticker                   AAPL      GOOG        GS
2009-03-01 06:29:59 -0.082203 -0.286354  1.227193
2009-03-02 06:29:59  0.340005 -0.688933 -1.505137
2009-03-03 06:29:59 -0.525567  0.321858 -0.035047
2009-03-04 06:29:59 -0.123549 -0.841781 -0.616523
2009-03-05 06:29:59 -0.407504  0.188372  1.311262
2009-03-06 06:29:59  0.272883  0.817179  0.584664
2009-03-07 06:29:59 -1.767227  1.168876  0.443096
2009-03-08 06:29:59 -0.685501 -0.534373 -0.063906
2009-03-09 06:29:59  0.851820  0.068740  0.566537
2009-03-10 06:29:59  0.390678 -0.012422 -0.152375
2009-03-11 06:29:59 -0.985585 -0.917705 -0.585091
2009-03-12 06:29:59  0.067498 -0.764343  0.497270mask_panel(pn, msk)print pn['close']Out[34]: 
ticker                   AAPL      GOOG        GS
2009-03-01 06:29:59 -0.082203 -0.286354       NaN
2009-03-02 06:29:59       NaN -0.688933 -1.505137
2009-03-03 06:29:59 -0.525567       NaN -0.035047
2009-03-04 06:29:59 -0.123549 -0.841781 -0.616523
2009-03-05 06:29:59 -0.407504       NaN       NaN
2009-03-06 06:29:59       NaN       NaN       NaN
2009-03-07 06:29:59 -1.767227       NaN       NaN
2009-03-08 06:29:59 -0.685501 -0.534373 -0.063906
2009-03-09 06:29:59       NaN       NaN       NaN
2009-03-10 06:29:59       NaN -0.012422 -0.152375
2009-03-11 06:29:59 -0.985585 -0.917705 -0.585091
2009-03-12 06:29:59       NaN -0.764343       NaN

So the above loop does the trick. I know there is a faster vectorized way of doing this using the ndarray, but I have not put that together yet. It also seems like this should be functionality that is built into the pandas library. If there is a way to do this that I am missing, any suggestions would be much appreciated.

Answer

I think this will work (and what Panel.where should do, but its a bit non-trivial because it has to handle a bunch of cases)

# construct the mask in 2-d (a frame)
In [36]: mask = (pn['close']>0) & (pn['rate']>0)In [37]: mask
Out[37]: 
ticker                AAPL   GOOG     GS
2009-03-01 06:29:59  False  False  False
2009-03-02 06:29:59  False  False   True
....# here's the key, this broadcasts, setting the values which 
# don't meet the condition to nan
In [38]: masked_values = np.where(mask,pn.values,np.nan)# reconstruct the panel (the _construct_axes_dict is an internal function that returns
# dict of the axes, e.g. items -> the items, major_axis -> .....
In [42]: x = pd.Panel(masked_values,**pn._construct_axes_dict())
Out[42]: 
<class 'pandas.core.panel.Panel'>
Dimensions: 2 (items) x 12 (major_axis) x 3 (minor_axis)
Items axis: close to rate
Major_axis axis: 2009-03-01 06:29:59 to 2009-03-12 06:29:59
Minor_axis axis: AAPL to GS# the values
In [43]: x
Out[43]: 
array([[[        nan,         nan,         nan],[        nan,         nan,  0.09575723],[        nan,         nan,         nan],[        nan,         nan,         nan],[        nan,  2.07229823,  0.04347515],[        nan,         nan,         nan],[        nan,         nan,  2.18342239],[        nan,         nan,  1.73674381],[        nan,  2.01173087,         nan],[ 0.24109645,  0.94583072,         nan],[ 0.36953467,         nan,  0.18044432],[ 1.74164222,  1.02314752,  1.73736033]],[[        nan,         nan,         nan],[        nan,         nan,  0.06960387],[        nan,         nan,         nan],[        nan,         nan,         nan],[        nan,  0.63202199,  0.56724391],[        nan,         nan,         nan],[        nan,         nan,  0.71964824],[        nan,         nan,  1.03482927],[        nan,  0.18256148,         nan],[ 1.29451667,  0.49804327,         nan],[ 2.04726538,         nan,  0.12883128],[ 0.70647885,  0.7277734 ,  0.77844475]]])
https://en.xdnf.cn/q/70374.html

Related Q&A

How can I move the text label of a radiobutton below the button in Python Tkinter?

Im wondering if theres a way to move the label text of a radiobutton to a different area, e.g. below the actual button.Below is an example of a few radiobuttons being placed using grid that Im using:fr…

play sound file in PyQt

Ive developed a software in PyQt which plays sound.Im using Phonon Library to play the sound but it has some lag.So how can I play a sound file in PyQt without using Phonon Library.This is how I am cur…

translating named list vectors from R into rpy2 in Python?

What is the equivalent of the following R code in Rpy2 in python?Var1 = c("navy", "darkgreen") names(Var1) = c("Class1", "Class2") ann_colors = list(Var1 = Var1…

Issue parsing multiline JSON file using Python

I am trying to parse a JSON multiline file using json library in Python 2.7. A simplified sample file is given below:{ "observations": {"notice": [{"copyright": "Copy…

timezone aware vs. timezone naive in python

I am working with datetime objects in python. I have a function that takes a time and finds the different between that time and now. def function(past_time):now = datetime.now()diff = now - past_timeWh…

How to return a value from Python script as a Bash variable?

This is a summary of my code:# import whateverdef createFolder():#someCodevar1=Gdrive.createFolder(name)return var1 def main():#someCodevar2=createFolder()return var2if __name__ == "__main__"…

How to align text to the right in ttk Treeview widget?

I am using a ttk.Treeview widget to display a list of Arabic books. Arabic is a right-to-left language, so the text should be aligned to the right. The justify option that is available for Label and o…

ImportError: cannot import name RemovedInDjango19Warning

Im on Django 1.8.7 and Ive just installed Django-Allauth by cloning the repo and running pip install in the apps directory in my webapp on the terminal. Now when I run manage.py migrate, I get this err…

How does Yahoo Finance calculate Adjusted Close stock prices?

Heres how Yahoo Finance apparently calculates Adjusted Close stock prices:https://help.yahoo.com/kb/adjusted-close-sln28256.htmlFrom this, I understand that a constant factor is applied to the unadjust…

Celery design help: how to prevent concurrently executing tasks

Im fairly new to Celery/AMQP and am trying to come up with a task/queue/worker design to meet the following requirements.I have multiple types of "per-user" tasks: e.g., TaskA, TaskB, TaskC. …