Pandas groupby and Multiindex

2024/4/14 22:06:00

Is there any opportunity in pandas to groupby data by MultiIndex? By this i mean passing to groupby function not only keys but keys and values to predefine dataframe columns?

a = np.array(['foo', 'foo', 'foo', 'bar', 'bar', 'foo', 'foo'], dtype=object)
b = np.array(['one', 'one', 'two', 'one', 'two', 'two', 'two'], dtype=object)
c = np.array(['dull', 'shiny', 'dull', 'dull', 'dull', 'shiny', 'shiny'], dtype=object)
df = pd.DataFrame([a, b, c]).T
df.columns = ['a', 'b', 'c']
df.groupby(['a', 'b', 'c']).apply(len)a    b    c    
bar  one  dull     1two  dull     1
foo  one  dull     1shiny    1two  dull     1shiny    2

But what I actually want is the following:

mi = pd.MultiIndex(levels=[['foo', 'bar'], ['one', 'two'], ['dull', 'shiny']],labels=[[0, 0, 0, 0, 1, 1, 1, 1], [0, 0, 1, 1, 0, 0, 1, 1], [0, 1, 0, 1, 0, 1, 0, 1]])
df.groupby(['a', 'b', 'c'], multi_index = mi).apply(len)
a    b    c    
bar  one  dull     1shiny    0two  dull     1shiny    0
foo  one  dull     1shiny    1two  dull     1shiny    2

The way i see it is in creation of additional wrapper on groupby object. Or maybe this feature feets well to pandas philosophy and it can be included in the pandas lib?


just reindex and fillna!

In [14]: df.groupby(['a', 'b', 'c']).size().reindex(index=mi).fillna(0)
foo  one  dull     1shiny    1two  dull     1shiny    2
bar  one  dull     1shiny    0two  dull     1shiny    0
dtype: float64

