Question 1

I have a groupby object I apply expanding mean to. However I want that calculation over another series/group at the same time. Here is my code:

d = { 'home' : ['A', 'B', 'B', 'A', 'B', 'A', 'A'], 'away' : ['B', 'A','A', 'B', 'A', 'B', 'B'],
'aw' : [1,0,0,0,1,0,np.nan],
'hw' : [0,1,0,1,0,1, np.nan]}df2 = pd.DataFrame(d, columns=['home', 'away', 'hw', 'aw'])
df2['tie'] = np.where(df2.hw == df2.aw, 1, 0)
df2.index = range(1,len(df2) + 1)avgcol = ['hw','tie','aw']
homenames = ['home_win_at_home', 'home_tie_at_home', 'home_loss_at_home']
awaynames = ['away_win_at_away', 'away_tie_at_away', 'away_loss_at_away']def win_at_venue(df, venuecol, avgcol, name):df[name] = df.groupby('away')[avgcol].apply(lambda x:pd.expanding_mean(x).shift())win_at_venue(df2, 'home', avgcol, homenames)
win_at_venue(df2, 'away', avgcol[::-1], awaynames)

How can I use pd.expanding_mean in a groupby object that will average over the 'home' and 'away' columns so I see their average wins/ties/losses over all venues? Now it just gives the prior win average for a team playing at home or away, not both home & away.

I've been trying different levels and df.stack() and reindexing but no luck.

Any help appreciated in getting there.

Here is what the correct result for just home wins at home and home wins all venues:

  home away  hw  aw  homewin_at_home  homewins_all_venues
0    A    B   0   1              NaN                  NaN
1    B    A   1   0              NaN                 1.00
2    B    A   0   0         1.000000                 1.00
3    A    B   1   0         0.000000                 0.00
4    B    A   0   1         0.500000                 0.50
5    A    B   1   0         0.500000                 0.40
6    A    B NaN NaN         0.666667                 0.50

Question 2

You may have to introduce a 'team' column to follow a team's record irrespective of venue. The below could get you closer. Starting with:

d = {'home': ['A', 'B', 'B', 'A', 'B', 'A', 'A'],'away': ['B', 'A', 'A', 'B', 'A', 'B', 'B'],'aw': [1, 0, 0, 0, 1, 0, np.nan],'hw': [0, 1, 0, 1, 0, 1, np.nan]}df = pd.DataFrame(d, columns=['home', 'away', 'hw', 'aw'])
df.index = range(1, len(df) + 1)
df.index.name = 'game'

To get:

  home away  hw  aw
0    A    B   0   1
1    B    A   1   0
2    B    A   0   0
3    A    B   1   0
4    B    A   0   1
5    A    B   1   0
6    A    B NaN NaNdf.index = range(1, len(df) + 1)
df.index.name = 'game'home away  hw  aw
game                  
1       A    B   0   1
2       B    A   1   0
3       B    A   0   0
4       A    B   1   0
5       B    A   0   1
6       A    B   1   0
7       A    B NaN NaN

Next, stack so you can follow each team:

df = df.set_index(['hw', 'aw'], append=True).stack().reset_index().rename(columns={'level_3': 'role', 0: 'team'}).loc[:,['game', 'team', 'role', 'hw', 'aw']]game team  role  hw  aw
0      1    A  home   0   1
1      1    B  away   0   1
2      2    B  home   1   0
3      2    A  away   1   0
4      3    B  home   0   0
5      3    A  away   0   0
6      4    A  home   1   0
7      4    B  away   1   0
8      5    B  home   0   1
9      5    A  away   0   1
10     6    A  home   1   0
11     6    B  away   1   0
12     7    A  home NaN NaN
13     7    B  away NaN NaN

Then, define what's a 'win', calculate overall record and apply expanding_mean:

def wins(row):if row['role'] == 'home':return row['hw']else:return row['aw']
df['wins'] = df.apply(wins, axis=1)df['expanding_mean'] = df.groupby('team')['wins'].apply(lambda x: pd.expanding_mean(x).shift())game team  role  hw  aw  wins  expanding_mean
0      1    A  home   0   1     0             NaN
1      1    B  away   0   1     1             NaN
2      2    B  home   1   0     1        1.000000
3      2    A  away   1   0     0        0.000000
4      3    B  home   0   0     0        1.000000
5      3    A  away   0   0     0        0.000000
6      4    A  home   1   0     1        0.000000
7      4    B  away   1   0     0        0.666667
8      5    B  home   0   1     0        0.500000
9      5    A  away   0   1     1        0.250000
10     6    A  home   1   0     1        0.400000
11     6    B  away   1   0     0        0.400000
12     7    A  home NaN NaN   NaN        0.500000
13     7    B  away NaN NaN   NaN        0.333333

Since you have references for both games and teams, you could merge and filter to get your preferred layout.

Expanding mean over multiple series in pandas

Related Q&A

Moving window sum on a boollean array, with steps.

Python - take the time difference from the first date in a column

(Django) Limited ForeignKey choices by Current User

Parse a custom text file in Python

Logs Dont Overwrite

How to upload local files to Firebase storage from Jupyter Notebook using Python

How can scrapy crawl more urls?

Pyplot - shift position of y-axis ticks and its data

How to exit a Python program or loop via keybind or macro? Keyboardinterrupt not working

SKlearn prediction on test dataset with different shape from training dataset shape