Python pandas idxmax for multiple indexes in a dataframe

2024/10/6 8:29:47

I have a series that looks like this:

            delivery
2007-04-26  706           23
2007-04-27  705           10706         1089708           83710           13712           51802            4806            1812            3
2007-04-29  706           39708            4712            1
2007-04-30  705            3706         1016707            2
...
2014-11-04  1412          531501           11502           11512           1
2014-11-05  1411          471412        13341501          401502         4331504         1261506         1001508           71510           61512          511604           11612           5
Length: 26255, dtype: int64

where the query is: df.groupby([df.index.date, 'delivery']).size()

For each day, I need to pull out the delivery number which has the most volume. I feel like it would be something like:

df.groupby([df.index.date, 'delivery']).size().idxmax(axis=1)

However, this just returns me the idxmax for the entire dataframe; instead, I need the second-level idmax (not the date but rather the delivery number) for each day, not the entire dataframe (ie. it returns a vector).

Any ideas on how to accomplish this?

Answer

Your example code doesn't work because the idxmax is executed after the groupby operation (so on the whole dataframe)

I'm not sure how to use idxmax on multilevel indexes, so here's a simple workaround.

Setting up data :

import pandas as pd
d= {'Date': ['2007-04-26', '2007-04-27', '2007-04-27', '2007-04-27','2007-04-27', '2007-04-28', '2007-04-28'], 'DeliveryNb': [706, 705, 708, 450, 283, 45, 89],'DeliveryCount': [23, 10, 1089, 82, 34, 100, 11]}df = pd.DataFrame.from_dict(d, orient='columns').set_index('Date')
print df

output

            DeliveryCount  DeliveryNb
Date                                 
2007-04-26             23         706
2007-04-27             10         705
2007-04-27           1089         708
2007-04-27             82         450
2007-04-27             34         283
2007-04-28            100          45
2007-04-28             11          89

creating custom function :

The trick is to use the reset_index() method (so you easily get the integer index of the group)

def func(df):idx = df.reset_index()['DeliveryCount'].idxmax()return df['DeliveryNb'].iloc[idx]

applying it :

g = df.groupby(df.index)
g.apply(func)

result :

Date
2007-04-26    706
2007-04-27    708
2007-04-28     45
dtype: int64
https://en.xdnf.cn/q/70388.html

Related Q&A

No of Pairs of consecutive prime numbers having difference of 6 like (23,29) from 1 to 2 billion

How to find number of pairs of consecutive prime numbers having difference of 6 like (23,29) from 1 to 2 billion (using any programming language and without using any external libraries) with consideri…

Building a docker image for a flask app fails in pip

from alpine:latest RUN apk add --no-cache python3-dev \&& pip3 install --upgrade pipWORKDIR /backend COPY . /backendRUN pip --no-cache-dir install -r requirements.txt EXPOSE 5000 ENTRYPOINT [py…

Why is numba so fast?

I want to write a function which will take an index lefts of shape (N_ROWS,) I want to write a function which will create a matrix out = (N_ROWS, N_COLS) matrix such that out[i, j] = 1 if and only if j…

How to create a field with a list of foreign keys in SQLAlchemy?

I am trying to store a list of models within the field of another model. Here is a trivial example below, where I have an existing model, Actor, and I want to create a new model, Movie, with the field …

Implementing a recursive algorithm in pyspark to find pairings within a dataframe

I have a spark dataframe (prof_student_df) that lists student/professor pair for a timestamp. There are 4 professors and 4 students for each timestamp and each professor-student pair has a “score” (s…

Python Delegate Pattern - How to avoid circular reference?

I would to ask if using the Delegate Pattern in Python would lead to circular references and if so, what would be the best way to implement it to ensure the object and its delegate will be garbage coll…

Render Jinja after jQuery AJAX request to Flask

I have a web application that gets dynamic data from Flask when a select element from HTML is changed. of course that is done via jquery ajax. No probs here I got that.The problem is, the dynamic data …

shape-preserving piecewise cubic interpolation for 3D curve in python

I have a curve in 3D space. I want to use a shape-preserving piecewise cubic interpolation on it similar to pchip in matlab. I researched functions provided in scipy.interpolate, e.g. interp2d, but …

ForeignKey vs OneToOne field django [duplicate]

This question already has answers here:OneToOneField() vs ForeignKey() in Django(12 answers)Closed 9 years ago.I need to extend django user with some additional fields . I found 2 different ways there…

How to sort glob.glob numerically?

I have a bunch of files sorted numerically on a folder, when I try to sort glob.glob I never get the files in the right order.file examples and expected output sorting folder ------ C:\Users\user\Deskt…