I am working on a code that would apply a rolling window to a function that would return multiple columns.
Input: Pandas Series
Expected output: 3-column DataFrame
def fun1(series, ):# Some calculations producing numbers a, b and creturn {"a": a, "b": b, "c": c} res.rolling('21 D').apply(fun1)
Contents of res:
time
2019-09-26 16:00:00 0.674969
2019-09-26 16:15:00 0.249569
2019-09-26 16:30:00 -0.529949
2019-09-26 16:45:00 -0.247077
2019-09-26 17:00:00 0.390827...
2019-10-17 22:45:00 0.232998
2019-10-17 23:00:00 0.590827
2019-10-17 23:15:00 0.768991
2019-10-17 23:30:00 0.142661
2019-10-17 23:45:00 -0.555284
Length: 1830, dtype: float64
Error:
TypeError: must be real number, not dict
What I've tried:
- Changing raw=True in apply
- Using a lambda function in in apply
- Returning result in fun1 as lists/numpy arrays/dataframe/series.
I have also went through many related posts in SO, to state a few:
- Pandas - Using `.rolling()` on multiple columns
- Returning two values from pandas.rolling_apply
- How to apply a function to two columns of Pandas dataframe
- Apply pandas function to column to create multiple new columns?
But none of the solution specified solves this problem.
Is there a straight-forward solution to this?
Here is a hacky answer using rolling
, producing a DataFrame:
import pandas as pd
import numpy as npdr = pd.date_range('09-26-2019', '10-17-2019', freq='15T')
data = np.random.rand(len(dr))s = pd.Series(data, index=dr)output = pd.DataFrame(columns=['a','b','c'])row = 0def compute(window, df):global rowa = window.max()b = window.min()c = a - bdf.loc[row,['a','b','c']] = [a,b,c]row+=1 return 1s.rolling('1D').apply(compute,kwargs={'df':output})output.index = s.index
It seems like the rolling
apply
function is always expecting a number to be returned, in order to immediately generate a new Series based on the calculations.
I am getting around this by making a new output
DataFrame (with the desired output columns), and writing to that within the function. I'm not sure if there is a way to get the index within a rolling object, so I instead use global
to make an increasing count for writing new rows. In light of the point above though, you need to return
some number. So while the actually rolling
operation returns a series of 1
, output
is modified:
In[0]:
sOut[0]:
2019-09-26 00:00:00 0.106208
2019-09-26 00:15:00 0.979709
2019-09-26 00:30:00 0.748573
2019-09-26 00:45:00 0.702593
2019-09-26 01:00:00 0.6170282019-10-16 23:00:00 0.742230
2019-10-16 23:15:00 0.729797
2019-10-16 23:30:00 0.094662
2019-10-16 23:45:00 0.967469
2019-10-17 00:00:00 0.455361
Freq: 15T, Length: 2017, dtype: float64In[1]:
outputOut[1]:a b c
2019-09-26 00:00:00 0.106208 0.106208 0.000000
2019-09-26 00:15:00 0.979709 0.106208 0.873501
2019-09-26 00:30:00 0.979709 0.106208 0.873501
2019-09-26 00:45:00 0.979709 0.106208 0.873501
2019-09-26 01:00:00 0.979709 0.106208 0.873501... ... ...
2019-10-16 23:00:00 0.980544 0.022601 0.957943
2019-10-16 23:15:00 0.980544 0.022601 0.957943
2019-10-16 23:30:00 0.980544 0.022601 0.957943
2019-10-16 23:45:00 0.980544 0.022601 0.957943
2019-10-17 00:00:00 0.980544 0.022601 0.957943[2017 rows x 3 columns]
This feels like more of an exploit of rolling
than an intended use, so I would be interested to see a more elegant answer.
UPDATE: Thanks to @JuanPi, you can get the rolling window index using this answer. So a non-global
answer could look like this:
def compute(window, df):a = window.max()b = window.min()c = a - bdf.loc[window.index.max(),['a','b','c']] = [a,b,c] return 1