Exponential Decay on Python Pandas DataFrame

2024/10/14 3:17:08

I'm trying to efficiently compute a running sum, with exponential decay, of each column of a Pandas DataFrame. The DataFrame contains a daily score for each country in the world. The DataFrame looks like this:

                AF        UK        US
2014-07-01  0.998042  0.595720  0.524698
2014-07-02  0.380649  0.838436  0.355149
2014-07-03  0.306240  0.274755  0.964524
2014-07-04  0.396721  0.836027  0.225848
2014-07-05  0.151291  0.677794  0.603548
2014-07-06  0.558846  0.050535  0.551785
2014-07-07  0.463514  0.552748  0.265537
2014-07-08  0.240282  0.278825  0.116432
2014-07-09  0.309446  0.096573  0.246021
2014-07-10  0.800977  0.583496  0.713893

I'm not sure how to calculate the rolling sum (with decay) without iterating through the dataframe, since I need to know yesterday's score to calculate today's score. But to calculate yesterday's score, I need to know the day before yesterday's score, etc. This is the code that I've been using, but I'd like a more efficient way to go about it.

for j, val in df.iteritems():for i, row in enumerate(val):df[j].iloc[i] = row + val[i-1]*np.exp(-0.05)
Answer

You can use the fact that when exponentials multiply their exponents add:

eg:

N(2) = N(2) + N(1) * exp(-0.05)
N(3) = N(3) + (N(2) + N(1) * exp(-0.05))*exp(-0.05)
N(3) = N(3) + N(2)*exp(-0.05) + N(1)*exp(-0.1)
N(4) = ...and so on

This can then be vectorized using numpy:

dataset = pd.DataFrame(np.random.rand(1000,3), columns=["A", "B","C"])weightspace = np.exp(np.linspace(len(dataset), 0, num=len(dataset))*-0.05)
def rollingsum(array):weights = weightspace[0-len(array):]# Convolve the array and the weights to obtain the resulta = np.dot(array, weights).sum()return aa = pd.expanding_apply(dataset, rollingsum)

pd.expanding_apply applies the rollingsum function backwards to each row, calling it len(dataset) times. np.linspace generates a dataset of size len(dataset) and calculates how many times each row is multiplied by exp(-0.05) for the current row.

Because it is vectorized, it should be fast:

%timeit a = pd.expanding_apply(dataset, rollingsum)
10 loops, best of 3: 25.5 ms per loop

This compares with (note I'm using python 3 and had to make a change to the behaviour on the first row...):

def multipleApply(df):for j, val in df.iteritems():for i, row in enumerate(val):if i == 0:continuedf[j].iloc[i] = row + val[i-1]*np.exp(-0.05)

This comes out as:

In[68]: %timeit multipleApply(dataset)
1 loops, best of 3: 414 ms per loop
https://en.xdnf.cn/q/69458.html

Related Q&A

TensorFlow - why doesnt this sofmax regression learn anything?

I am aiming to do big things with TensorFlow, but Im trying to start small. I have small greyscale squares (with a little noise) and I want to classify them according to their colour (e.g. 3 categories…

Extended example to understand CUDA, Numba, Cupy, etc

Mostly all examples of Numba, CuPy and etc available online are simple array additions, showing the speedup from going to cpu singles core/thread to a gpu. And commands documentations mostly lack good …

Python 2 newline tokens in tokenize module

I am using the tokenize module in Python and wonder why there are 2 different newline tokens:NEWLINE = 4 NL = 54Any examples of code that would produce both tokens would be appreciated.

Prevent encoding errors in Python

I have scripts which print out messages by the logging system or sometimes print commands. On the Windows console I get error messages likeTraceback (most recent call last):File "C:\Python32\lib\l…

How do I get the operating system name in a friendly manner using Python 2.5?

I tried:print os.nameAnd the output I got was::ntHowever, I want output more like "Windows 98", or "Linux".After suggestions in this question, I also tried:import os print os.name i…

Extend dataclass __repr__ programmatically

Suppose I have a dataclass with a set method. How do I extend the repr method so that it also updates whenever the set method is called: from dataclasses import dataclass @dataclass class State:A: int …

find least common denominator for list of fractions in python

I have a list of fractionsfrom fractions import Fractionfractions_list=[Fraction(3,14),Fraction(1,7),Fraction(9,14)]The output should be a list with the numerators for each fraction, then the denominat…

How to configure uwsgi to encode logging as json except app output

Im running uwsgi around a Python Flask webapp with these options (among others) to get JSON-encoded log records on stdout:fmt=${"timestamp": "${strftime:%FT%TZ}", "level":…

Testing aiohttp client with unittest.mock.patch

Ive written a simple HTTP client using aiohttp and Im trying to test it by patching aiohttp.ClientSession and aiohttp.ClientResponse. However, it appears as though the unittest.mock.patch decorator is …

GridsearchCV: cant pickle function error when trying to pass lambda in parameter

I have looked quite extensively on stackoverflow and elsewhere and I cant seem to find an answer to the problem below. I am trying to modify a parameter of a function that is itself a parameter inside …