Calculate a rolling regression in Pandas and store the slope

2024/9/16 23:36:03

I have some time series data and I want to calculate a groupwise rolling regression of the last n days in Pandas and store the slope of that regression in a new column.

I searched the older questions and they either haven't been answered, or used Pandas OLS which I heard is deprecated.

I figured that I probably could use df.rolling.apply() in combination with the scipy.stats.linregress function, but I can't figure out a lambda function that does what I want to do.

Here is some sample code

import numpy as np
import pandas as pd
from scipy.stats import linregress# make sample data
days = 21
groups = ['A', 'B', 'C']
data_days = list(range(days)) * len(groups)
values = np.random.rand(days*len(groups))df = pd.DataFrame(data=zip(sorted(groups*days), data_days, values), columns=['group', 'day', 'value'])# calculate slope of regression of last 7 days
days_back = 7grouped_data = df.groupby('group')
for g, data in grouped_data:window = data.rolling(window=days_back,min_periods=days_back)

I need a new column called 'slope' in which, from day 7 onward, the slope of a linear regression through the last 7 days is stored.

Answer

I had some wrong assumptions, first I don't need to loop through the groups, and second I didn't really understand how rolling.apply worked...

So here is the (seemingly) working code. I used the linregress function from scipy.stats:

import numpy as np
import pandas as pd
from scipy.stats import linregress# create random sample data
days = 14
groups = ['A', 'B', 'C']
data_days = list(range(days)) * len(groups)
values = np.random.rand(days*len(groups))df = pd.DataFrame(data=zip(sorted(groups*days), data_days, values), columns=['group', 'day', 'value'])def get_slope(array):y = np.array(array)x = np.arange(len(y))slope, intercept, r_value, p_value, std_err = linregress(x,y)return slope# calculate slope of regression of last 7 days
days_back = 3df['rolling_slope'] = df.groupby('group')['value'].rolling(window=days_back,min_periods=days_back).apply(get_slope, raw=False).reset_index(0, drop=True)print(df)
https://en.xdnf.cn/q/73042.html

Related Q&A

Python read microphone

I am trying to make python grab data from my microphone, as I want to make a random generator which will use noise from it. So basically I dont want to record the sounds, but rather read it in as a da…

How to tell pytest-xdist to run tests from one folder sequencially and the rest in parallel?

Imagine that I have test/unit/... which are safe to run in parallel and test/functional/... which cannot be run in parallel yet.Is there an easy way to convince pytest to run the functional ones sequen…

PyPDF4 - Exported PDF file size too big

I have a PDF file of around 7000 pages and 479 MB. I have create a python script using PyPDF4 to extract only specific pages if the pages contain specific words. The script works but the new PDF file,…

Jupyter install fails on Mac

Im trying to install Jupyter on my Mac (OS X El Capitan) and Im getting an error in response to:sudo pip install -U jupyterAt first the download/install starts fine, but then I run into this:Installing…

Python Error Codes are upshifted

Consider a python script error.pyimport sys sys.exit(3)Invokingpython error.py; echo $?yields the expected "3". However, consider runner.pyimport os result = os.system("python error.py&…

Running dozens of Scrapy spiders in a controlled manner

Im trying to build a system to run a few dozen Scrapy spiders, save the results to S3, and let me know when it finishes. There are several similar questions on StackOverflow (e.g. this one and this oth…

How to merge two DataFrame columns and apply pandas.to_datetime to it?

Im learning to use pandas, to use it for some data analysis. The data is supplied as a csv file, with several columns, of which i only need to use 4 (date, time, o, c). Ill like to create a new DataFr…

Breaking a parent function from within a child function (PHP Preferrably)

I was challenged how to break or end execution of a parent function without modifying the code of the parent, using PHPI cannot figure out any solution, other than die(); in the child, which would end …

Get absolute path of caller file

Say I have two files in different directories: 1.py (say, in C:/FIRST_FOLDER/1.py) and 2.py (say, in C:/SECOND_FOLDER/2.py).The file 1.py imports 2.py (using sys.path.insert(0, #path_of_2.py) followed,…

Pandas dataframe to excel gives file is not UTF-8 encoded

Im working on lists that I want to export into an Excel file.I found a lot of people advising to use pandas.dataframe so thats what I did. I could create the dataframe but when I try to export it to Ex…