How to apply different aggregation functions to same column by using pandas Groupby

2024/9/25 6:29:01

It is clear when doing

 data.groupby(['A','B']).mean()

We get something multiindex by level 'A' and 'B' and one column with the mean of each group

how could I have the count(), std() simultaneously ?

so result looks like in a dataframe

A   B    mean   count   std
Answer

The following should work:

data.groupby(['A','B']).agg([pd.Series.mean, pd.Series.std, pd.Series.count])

basically call agg and passing a list of functions will generate multiple columns with those functions applied.

Example:

In [12]:df = pd.DataFrame({'a':np.random.randn(5), 'b':[0,0,1,1,2]})
df.groupby(['b']).agg([pd.Series.mean, pd.Series.std, pd.Series.count])
Out[12]:a                mean       std count
b                          
0 -0.769198  0.158049     2
1  0.247708  0.743606     2
2 -0.312705       NaN     1

You can also pass the string of the method names, the common ones work, some of the more obscure ones don't I can't remember which but in this case they work fine, thanks to @ajcr for the suggestion:

In [16]:
df = pd.DataFrame({'a':np.random.randn(5), 'b':[0,0,1,1,2]})
df.groupby(['b']).agg(['mean', 'std', 'count'])Out[16]:a                mean       std count
b                          
0 -1.037301  0.790498     2
1 -0.495549  0.748858     2
2 -0.644818       NaN     1
https://en.xdnf.cn/q/71608.html

Related Q&A

Can not connect to an abstract unix socket in python

I have a server written in c++ which creates and binds to an abstract unix socket with a namespace address of "\0hidden". I also have a client which is written in c++ also and this client can…

Pandas display extra unnamed columns for an excel file

Im working on a project using pandas library, in which I need to read an Excel file which has following columns: invoiceid, locationid, timestamp, customerid, discount, tax,total, subtotal, productid, …

Modifying the weights and biases of a restored CNN model in TensorFlow

I have recently started using TensorFlow (TF), and I have come across a problem that I need some help with. Basically, Ive restored a pre-trained model, and I need to modify the weights and biases of o…

Flask SQLAlchemy paginate over objects in a relationship

So I have two models: Article and Tag, and a m2m relationship which is properly set.I have a route of the kind articles/tag/ and I would like to display only those articles related to that tagI have so…

generating correlated numbers in numpy / pandas

I’m trying to generate simulated student grades in 4 subjects, where a student record is a single row of data. The code shown here will generate normally distributed random numbers with a mean of 60 …

AttributeError: list object has no attribute split

Using Python 2.7.3.1I dont understand what the problem is with my coding! I get this error: AttributeError: list object has no attribute splitThis is my code:myList = [hello]myList.split()

Managing multiple Twisted client connections

Im trying to use Twisted in a sort of spidering program that manages multiple client connections. Id like to maintain of a pool of about 5 clients working at one time. The functionality of each clien…

using a conditional and lambda in map

If I want to take a list of numbers and do something like this:lst = [1,2,4,5] [1,2,4,5] ==> [lower,lower,higher,higher]where 3 is the condition using the map function, is there an easy way?Clearly…

Tkinter: What are the correct values for the anchor option in the message widget?

I have been learning tkinter through Message widget in Tkinter at Python Courses and Tutorials. I keep getting an error when I add the anchor option with the options presented on the site. I am being t…

Why isnt Pickle calling __new__ like the documentation says?

The documentation for Pickle specifically says:Instances of a new-style class C are created using:obj = C.__new__(C, *args)Attempting to take advantage of this, I created a singleton with no instance a…