Expanding mean over multiple series in pandas

2024/10/12 20:22:19

I have a groupby object I apply expanding mean to. However I want that calculation over another series/group at the same time. Here is my code:

d = { 'home' : ['A', 'B', 'B', 'A', 'B', 'A', 'A'], 'away' : ['B', 'A','A', 'B', 'A', 'B', 'B'],
'aw' : [1,0,0,0,1,0,np.nan],
'hw' : [0,1,0,1,0,1, np.nan]}df2 = pd.DataFrame(d, columns=['home', 'away', 'hw', 'aw'])
df2['tie'] = np.where(df2.hw == df2.aw, 1, 0)
df2.index = range(1,len(df2) + 1)avgcol = ['hw','tie','aw']
homenames = ['home_win_at_home', 'home_tie_at_home', 'home_loss_at_home']
awaynames = ['away_win_at_away', 'away_tie_at_away', 'away_loss_at_away']def win_at_venue(df, venuecol, avgcol, name):df[name] = df.groupby('away')[avgcol].apply(lambda x:pd.expanding_mean(x).shift())win_at_venue(df2, 'home', avgcol, homenames)
win_at_venue(df2, 'away', avgcol[::-1], awaynames)

How can I use pd.expanding_mean in a groupby object that will average over the 'home' and 'away' columns so I see their average wins/ties/losses over all venues? Now it just gives the prior win average for a team playing at home or away, not both home & away.

I've been trying different levels and df.stack() and reindexing but no luck.

Any help appreciated in getting there.

Here is what the correct result for just home wins at home and home wins all venues:

  home away  hw  aw  homewin_at_home  homewins_all_venues
0    A    B   0   1              NaN                  NaN
1    B    A   1   0              NaN                 1.00
2    B    A   0   0         1.000000                 1.00
3    A    B   1   0         0.000000                 0.00
4    B    A   0   1         0.500000                 0.50
5    A    B   1   0         0.500000                 0.40
6    A    B NaN NaN         0.666667                 0.50
Answer

You may have to introduce a 'team' column to follow a team's record irrespective of venue. The below could get you closer. Starting with:

d = {'home': ['A', 'B', 'B', 'A', 'B', 'A', 'A'],'away': ['B', 'A', 'A', 'B', 'A', 'B', 'B'],'aw': [1, 0, 0, 0, 1, 0, np.nan],'hw': [0, 1, 0, 1, 0, 1, np.nan]}df = pd.DataFrame(d, columns=['home', 'away', 'hw', 'aw'])
df.index = range(1, len(df) + 1)
df.index.name = 'game'

To get:

  home away  hw  aw
0    A    B   0   1
1    B    A   1   0
2    B    A   0   0
3    A    B   1   0
4    B    A   0   1
5    A    B   1   0
6    A    B NaN NaNdf.index = range(1, len(df) + 1)
df.index.name = 'game'home away  hw  aw
game                  
1       A    B   0   1
2       B    A   1   0
3       B    A   0   0
4       A    B   1   0
5       B    A   0   1
6       A    B   1   0
7       A    B NaN NaN

Next, stack so you can follow each team:

df = df.set_index(['hw', 'aw'], append=True).stack().reset_index().rename(columns={'level_3': 'role', 0: 'team'}).loc[:,['game', 'team', 'role', 'hw', 'aw']]game team  role  hw  aw
0      1    A  home   0   1
1      1    B  away   0   1
2      2    B  home   1   0
3      2    A  away   1   0
4      3    B  home   0   0
5      3    A  away   0   0
6      4    A  home   1   0
7      4    B  away   1   0
8      5    B  home   0   1
9      5    A  away   0   1
10     6    A  home   1   0
11     6    B  away   1   0
12     7    A  home NaN NaN
13     7    B  away NaN NaN

Then, define what's a 'win', calculate overall record and apply expanding_mean:

def wins(row):if row['role'] == 'home':return row['hw']else:return row['aw']
df['wins'] = df.apply(wins, axis=1)df['expanding_mean'] = df.groupby('team')['wins'].apply(lambda x: pd.expanding_mean(x).shift())game team  role  hw  aw  wins  expanding_mean
0      1    A  home   0   1     0             NaN
1      1    B  away   0   1     1             NaN
2      2    B  home   1   0     1        1.000000
3      2    A  away   1   0     0        0.000000
4      3    B  home   0   0     0        1.000000
5      3    A  away   0   0     0        0.000000
6      4    A  home   1   0     1        0.000000
7      4    B  away   1   0     0        0.666667
8      5    B  home   0   1     0        0.500000
9      5    A  away   0   1     1        0.250000
10     6    A  home   1   0     1        0.400000
11     6    B  away   1   0     0        0.400000
12     7    A  home NaN NaN   NaN        0.500000
13     7    B  away NaN NaN   NaN        0.333333

Since you have references for both games and teams, you could merge and filter to get your preferred layout.

https://en.xdnf.cn/q/118165.html

Related Q&A

Moving window sum on a boollean array, with steps.

Im struggling with creating a moving window sum function that calculates the number of True values in a given numpy Boolean array my_array, with a window size of n and in jumping steps of s.For example…

Python - take the time difference from the first date in a column

Given the date column, I want to create another column diff that count how many days apart from the first date.date diff 2011-01-01 00:00:10 0 2011-01-01 00:00:11 0.000011 …

(Django) Limited ForeignKey choices by Current User

Update Thanks to Michael I was able to get this to work perfectly in my CreateView, but not in the UpdateView. When I try to set a form_class it spits out an improperly configured error. How can I go a…

Parse a custom text file in Python

I have a text to be parsed, this is a concise form of the text.apple {type=fruitvarieties {color=redorigin=usa} }the output should be as shown belowapple.type=fruit apple.varieties.color=red apple.vari…

Logs Dont Overwrite

Im using Pythons logging.config module to configure and use a logging tool in my project.I want my log files to overwrite each time (not append), so I set my YAML configuration file like this:# logging…

How to upload local files to Firebase storage from Jupyter Notebook using Python

Since I guess importing google.cloud.storage might be a very first step to set API connecting the firebase storage, what I did first is to install google-cloud on Ubuntu like this:$ pip install --upgra…

How can scrapy crawl more urls?

as we see:def parse(self, response):hxs = HtmlXPathSelector(response)sites = hxs.select(//ul/li)items = []for site in sites:item = Website()item[name] = site.select(a/text()).extract()item[url] = site.…

Pyplot - shift position of y-axis ticks and its data

Using pyplot, how do I modify my plot to change the vertical position of my yticks? E.g. in my plot above, I want to move Promoter down and CDS up (along with their lines in the plot).For the above pl…

How to exit a Python program or loop via keybind or macro? Keyboardinterrupt not working

I am trying to complete a simple GUI automation program that merely opens a web page and then clicks on a specific spot on the page every 0.2 seconds until I tell it to stop. I want my code to run and …

SKlearn prediction on test dataset with different shape from training dataset shape

Im new to ML and would be grateful for any assistance provided. Ive run a linear regression prediction using test set A and training set A. I saved the linear regression model and would now like to use…