Pandas apply on rolling with multi-column output

2024/10/3 6:28:58

I am working on a code that would apply a rolling window to a function that would return multiple columns.

Input: Pandas Series
Expected output: 3-column DataFrame

def fun1(series, ):# Some calculations producing numbers a, b and creturn {"a": a, "b": b, "c": c} res.rolling('21 D').apply(fun1)

Contents of res:

time
2019-09-26 16:00:00    0.674969
2019-09-26 16:15:00    0.249569
2019-09-26 16:30:00   -0.529949
2019-09-26 16:45:00   -0.247077
2019-09-26 17:00:00    0.390827...   
2019-10-17 22:45:00    0.232998
2019-10-17 23:00:00    0.590827
2019-10-17 23:15:00    0.768991
2019-10-17 23:30:00    0.142661
2019-10-17 23:45:00   -0.555284
Length: 1830, dtype: float64

Error:

TypeError: must be real number, not dict

What I've tried:

  • Changing raw=True in apply
  • Using a lambda function in in apply
  • Returning result in fun1 as lists/numpy arrays/dataframe/series.

I have also went through many related posts in SO, to state a few:

  • Pandas - Using `.rolling()` on multiple columns
  • Returning two values from pandas.rolling_apply
  • How to apply a function to two columns of Pandas dataframe
  • Apply pandas function to column to create multiple new columns?

But none of the solution specified solves this problem.

Is there a straight-forward solution to this?

Answer

Here is a hacky answer using rolling, producing a DataFrame:

import pandas as pd
import numpy as npdr = pd.date_range('09-26-2019', '10-17-2019', freq='15T')
data = np.random.rand(len(dr))s = pd.Series(data, index=dr)output = pd.DataFrame(columns=['a','b','c'])row = 0def compute(window, df):global rowa = window.max()b = window.min()c = a - bdf.loc[row,['a','b','c']] = [a,b,c]row+=1    return 1s.rolling('1D').apply(compute,kwargs={'df':output})output.index = s.index

It seems like the rolling apply function is always expecting a number to be returned, in order to immediately generate a new Series based on the calculations.

I am getting around this by making a new output DataFrame (with the desired output columns), and writing to that within the function. I'm not sure if there is a way to get the index within a rolling object, so I instead use global to make an increasing count for writing new rows. In light of the point above though, you need to return some number. So while the actually rolling operation returns a series of 1, output is modified:

In[0]:
sOut[0]:
2019-09-26 00:00:00    0.106208
2019-09-26 00:15:00    0.979709
2019-09-26 00:30:00    0.748573
2019-09-26 00:45:00    0.702593
2019-09-26 01:00:00    0.6170282019-10-16 23:00:00    0.742230
2019-10-16 23:15:00    0.729797
2019-10-16 23:30:00    0.094662
2019-10-16 23:45:00    0.967469
2019-10-17 00:00:00    0.455361
Freq: 15T, Length: 2017, dtype: float64In[1]:
outputOut[1]:a         b         c
2019-09-26 00:00:00  0.106208  0.106208  0.000000
2019-09-26 00:15:00  0.979709  0.106208  0.873501
2019-09-26 00:30:00  0.979709  0.106208  0.873501
2019-09-26 00:45:00  0.979709  0.106208  0.873501
2019-09-26 01:00:00  0.979709  0.106208  0.873501...       ...       ...
2019-10-16 23:00:00  0.980544  0.022601  0.957943
2019-10-16 23:15:00  0.980544  0.022601  0.957943
2019-10-16 23:30:00  0.980544  0.022601  0.957943
2019-10-16 23:45:00  0.980544  0.022601  0.957943
2019-10-17 00:00:00  0.980544  0.022601  0.957943[2017 rows x 3 columns]

This feels like more of an exploit of rolling than an intended use, so I would be interested to see a more elegant answer.

UPDATE: Thanks to @JuanPi, you can get the rolling window index using this answer. So a non-globalanswer could look like this:

def compute(window, df):a = window.max()b = window.min()c = a - bdf.loc[window.index.max(),['a','b','c']] = [a,b,c]  return 1
https://en.xdnf.cn/q/70758.html

Related Q&A

Exceptions for the whole class

Im writing a program in Python, and nearly every method im my class is written like this: def someMethod(self):try:#...except someException:#in case of exception, do something here#e.g display a dialog…

Getting live output from asyncio subprocess

Im trying to use Python asyncio subprocesses to start an interactive SSH session and automatically input the password. The actual use case doesnt matter but it helps illustrate my problem. This is my c…

multi language support in python script

I have a large python (2.7) script that reads data from a database and generate pictures in pdf format. My pictures have strings for labels, etc... Now I want to add a multi language support for the sc…

Add date tickers to a matplotlib/python chart

I have a question that sounds simple but its driving me mad for some days. I have a historical time series closed in two lists: the first list is containing prices, lets say P = [1, 1.5, 1.3 ...] while…

Python Selenium: Cant find element by xpath when browser is headless

Im attempting to log into a website using Python Selenium using the following code:import time from contextlib import contextmanager from selenium import webdriver from selenium.webdriver.chrome.option…

Reading large file in Spark issue - python

I have spark installed in local, with python, and when running the following code:data=sc.textFile(C:\\Users\\xxxx\\Desktop\\train.csv) data.first()I get the following error:---------------------------…

pyinstaller: 2 instances of my cherrypy app exe get executed

I have a cherrypy app that Ive made an exe with pyinstaller. now when I run the exe it loads itself twice into memory. Watching the taskmanager shows the first instance load into about 1k, then a seco…

python - Dataframes with RangeIndex vs.Int64Index - Why?

EDIT: I have just found a line in my code that changes my df from a RangeIndex to a numeric Int64Index. How and why does this happen?Before this line all my df are type RangeIndex. After this line of …

Uniform Circular LBP face recognition implementation

I am trying to implement a basic face recognition system using Uniform Circular LBP (8 Points in 1 unit radius neighborhood). I am taking an image, re-sizing it to 200 x 200 pixels and then splitting …

SQLAlchemy declarative one-to-many not defined error

Im trying to figure how to define a one-to-many relationship using SQLAlchemys declarative ORM, and trying to get the example to work, but Im getting an error that my sub-class cant be found (naturally…