Python pandas: select 2nd smallest value in groupby

2024/9/20 12:00:14

I have an example DataFrame like the following:

import pandas as pd
import numpy as np
df = pd.DataFrame({'ID':[1,2,2,2,3,3,], 'date':array(['2000-01-01','2002-01-01','2010-01-01','2003-01-01','2004-01-01','2008-01-01'],dtype='datetime64[D]')})

I am trying to get the 2nd earliest day in each ID group. So I wrote the following funciton:

def f(x):if len(x)==1:return x[0]else:x.sort()return x[1]

And then I wrote:

df.groupby('ID').date.apply(lambda x:f(x))

The result is an error.

Could you find a way to make this work?

Answer

This requires 0.14.1. And will be quite efficient, especially if you have large groups (as this doesn't require fully sorting them).

In [32]: df.groupby('ID')['date'].nsmallest(2)
Out[32]: 
ID   
1   0   2000-01-01
2   1   2002-01-013   2003-01-01
3   4   2004-01-015   2008-01-01
dtype: datetime64[ns]In [33]: df.groupby('ID')['date'].nsmallest(2).groupby(level='ID').last()
Out[33]: 
ID
1    2000-01-01
2    2003-01-01
3    2008-01-01
dtype: datetime64[ns]
https://en.xdnf.cn/q/72172.html

Related Q&A

How to disable SSL3 and weak ciphers with cherrypy builtin ssl module (python 3)

I have configured Cherrypy 3.8.0 with Python 3 to use SSL/TLS. However, I want to disable SSL3 to avoid POODLE. I searched through the documentation but I am unsure on how to implement it.I am using th…

cleaning big data using python

I have to clean a input data file in python. Due to typo error, the datafield may have strings instead of numbers. I would like to identify all fields which are a string and fill these with NaN using p…

Using the Python shell in Vi mode on Windows

I know that you can use the Python shell in Vi mode on Unix-like operating systems. For example, I have this line in my ~/.inputrc:set editing-mode viThis lets me use Vi-style editing inside the Python…

Calculate residual deviance from scikit-learn logistic regression model

Is there any way to calculate residual deviance of a scikit-learn logistic regression model? This is a standard output from R model summaries, but I couldnt find it any of sklearns documentation.

Use Python to create 2D coordinate

I am truly a novice in Python. Now, I am doing a project which involves creating a list of 2D coordinates. The coordinates should be uniformly placed, using a square grid (10*10), like(0,0)(0,1)(0,2)(0…

How to pass Unicode title to matplotlib?

Cant get the titles right in matplotlib: technologien in C gives: technologien in CPossible solutions already tried:utechnologien in C doesnt work neither does: # -*- coding: utf-8 -*- at the beginnin…

Cythonize but not compile .pyx files using setup.py

I have a Cython project containing several .pyx files. To distribute my project I would like to provide my generated .c files as recommended in the Cython documentation, to minimize problems with diffe…

How to clear matplotlib labels in legend?

Is there a way to clear matplotlib labels inside a graphs legend? This post explains how to remove the legend itself, but the labels themselves still remain, and appear again if you plot a new figure.…

Threading and Signals problem in PyQt

Im having some problems with communicating between Threads in PyQt. Im using signals to communicate between two threads, a Sender and a Listener. The sender sends messages, which are expected to be rec…

stopping a python thread using __del__

I have a threaded program in Python that works fine except that __del__ does not get called once the thread is running:class tt(threading.Thread):def __init__(self):threading.Thread.__init__(self)self.…