I have a groupby object I apply expanding mean to. However I want that calculation over another series/group at the same time. Here is my code:
d = { 'home' : ['A', 'B', 'B', 'A', 'B', 'A', 'A'], 'away' : ['B', 'A','A', 'B', 'A', 'B', 'B'],
'aw' : [1,0,0,0,1,0,np.nan],
'hw' : [0,1,0,1,0,1, np.nan]}df2 = pd.DataFrame(d, columns=['home', 'away', 'hw', 'aw'])
df2['tie'] = np.where(df2.hw == df2.aw, 1, 0)
df2.index = range(1,len(df2) + 1)avgcol = ['hw','tie','aw']
homenames = ['home_win_at_home', 'home_tie_at_home', 'home_loss_at_home']
awaynames = ['away_win_at_away', 'away_tie_at_away', 'away_loss_at_away']def win_at_venue(df, venuecol, avgcol, name):df[name] = df.groupby('away')[avgcol].apply(lambda x:pd.expanding_mean(x).shift())win_at_venue(df2, 'home', avgcol, homenames)
win_at_venue(df2, 'away', avgcol[::-1], awaynames)
How can I use pd.expanding_mean
in a groupby
object that will average over the 'home'
and 'away'
columns so I see their average wins/ties/losses over all venues? Now it just gives the prior win average for a team playing at home or away, not both home & away.
I've been trying different levels and df.stack() and reindexing but no luck.
Any help appreciated in getting there.
Here is what the correct result for just home wins at home and home wins all venues:
home away hw aw homewin_at_home homewins_all_venues
0 A B 0 1 NaN NaN
1 B A 1 0 NaN 1.00
2 B A 0 0 1.000000 1.00
3 A B 1 0 0.000000 0.00
4 B A 0 1 0.500000 0.50
5 A B 1 0 0.500000 0.40
6 A B NaN NaN 0.666667 0.50