Pandas groupby and Multiindex

2024/4/14 22:06:00

Is there any opportunity in pandas to groupby data by MultiIndex? By this i mean passing to groupby function not only keys but keys and values to predefine dataframe columns?

a = np.array(['foo', 'foo', 'foo', 'bar', 'bar', 'foo', 'foo'], dtype=object)
b = np.array(['one', 'one', 'two', 'one', 'two', 'two', 'two'], dtype=object)
c = np.array(['dull', 'shiny', 'dull', 'dull', 'dull', 'shiny', 'shiny'], dtype=object)
df = pd.DataFrame([a, b, c]).T
df.columns = ['a', 'b', 'c']
df.groupby(['a', 'b', 'c']).apply(len)a    b    c    
bar  one  dull     1two  dull     1
foo  one  dull     1shiny    1two  dull     1shiny    2

But what I actually want is the following:

mi = pd.MultiIndex(levels=[['foo', 'bar'], ['one', 'two'], ['dull', 'shiny']],labels=[[0, 0, 0, 0, 1, 1, 1, 1], [0, 0, 1, 1, 0, 0, 1, 1], [0, 1, 0, 1, 0, 1, 0, 1]])
df.groupby(['a', 'b', 'c'], multi_index = mi).apply(len)
a    b    c    
bar  one  dull     1shiny    0two  dull     1shiny    0
foo  one  dull     1shiny    1two  dull     1shiny    2

The way i see it is in creation of additional wrapper on groupby object. Or maybe this feature feets well to pandas philosophy and it can be included in the pandas lib?


just reindex and fillna!

In [14]: df.groupby(['a', 'b', 'c']).size().reindex(index=mi).fillna(0)
foo  one  dull     1shiny    1two  dull     1shiny    2
bar  one  dull     1shiny    0two  dull     1shiny    0
dtype: float64

Related Q&A

How to dump YAML with explicit references?

Recursive references work great in ruamel.yaml or pyyaml: $ ruamel.yaml.dump(ruamel.yaml.load(&A [ *A ])) &id001 - *id001However it (obviously) does not work on normal references: $ ruamel.yaml…

How to set a Pydantic field value depending on other fields

from pydantic import BaseModelclass Grafana(BaseModel):user: strpassword: strhost: strport: strapi_key: str | None = NoneGRAFANA_URL = f"http://{user}:{password}@{host}:{port}"API_DATASOURCES…

Cascade multiple RNN models for N-dimensional output

Im having some difficulty with chaining together two models in an unusual way. I am trying to replicate the following flowchart:For clarity, at each timestep of Model[0] I am attempting to generate an …

Pandas Flatten a list of list within a column?

I am trying to flatten a column which is a list of lists:var var2 0 9122532.0 [[458182615.0], [79834910.0]] 1 79834910.0 [[458182615.0], [9122532.0]] 2 458182615.0 [[79834910.0], [9122…

How to use libxml2 with python on macOs?

Im on OSX Lion and I have libxml2 installed (by default) and I have python installed (by default) but they dont talk to one another. Whats the simplest way to make this work on Lion?$ python -c "…

SMTP Authentication error while while sending mail from outlook using python language

import smtplibsmtpObj = smtplib.SMTP(, 587)smtpObj.ehlo()smtpObj.starttls()smtpObj.login([email protected], abcde)smtpObj.sendmail([email protected], [email protected], Subject: So l…

How do you change environment of Python Interactive on Vscode?

I recently migrated from Spyder to VScode. I created a new conda environment and used setting.json to change the environment in VScode, "python.pythonPath": "/Users/dcai/anaconda3/envs/…

Validate list in marshmallow

currently I am using marshmallow schema to validate the request, and I have this a list and I need to validate the content of it.class PostValidationSchema(Schema):checks = fields.List(fields.String(re…

Save unicode in redis but fetch error

Im using mongodb and redis, redis is my cache.Im caching mongodb objects with redis-py:obj in mongodb: {uname: umatch, usection_title: u\u6d3b\u52a8, utitle: u\u6bd4\u8d5b, usection_id: 1, u_id: Objec…

Authentication with public keys and cx_Oracle using Python

Ive Googled a bit but I havent found any substantial results. Is it possible to use key-based authentication to connect to an Oracle server using Python? My objective is to be able to automate some re…