Pandas groupwise percentage

2024/10/5 15:07:05

How can I calculate a group-wise percentage in pandas?

similar to Pandas: .groupby().size() and percentages or Pandas Very Simple Percent of total size from Group by I want to calculate the percentage of a value per group.

How can I achieve this?

My dataset is structured like

ClassLabel, Field

Initially, I aggregate on both ClassLbel and Field like

grouped = mydf.groupby(['Field', 'ClassLabel']).size().reset_index()
grouped = grouped.rename(columns={0: 'customersCountPerGroup'})

Now I would like to know the percentage of customers in each group on a per group basis. The groups total can be obtained like mydf.groupby(['Field']).size() but I neither can merge that as a column nor am I sure this is the right approach - there must be something simpler.

edit

I want to calculate the percentage only based on a single group e.g. 3 0 0.125 1 0.250 the sum of 0 + 1 --> 0.125 + 0.250 = 0,375 and use this value to devide / normalize grouped and not grouped.sum() enter image description here

Answer

IIUC you can use:

mydf = pd.DataFrame({'Field':[1,1,3,3,3],'ClassLabel':[4,4,4,4,4],'A':[7,8,9,5,7]})print (mydf)A  ClassLabel  Field
0  7           4      1
1  8           4      1
2  9           4      3
3  5           4      3
4  7           4      3grouped = mydf.groupby(['Field', 'ClassLabel']).size()
print (grouped)
Field  ClassLabel
1      4             2
3      4             3
dtype: int64print (100 * grouped / grouped.sum())
Field  ClassLabel
1      4             40.0
3      4             60.0
dtype: float64

grouped = mydf.groupby(['Field', 'ClassLabel']).size().reset_index()
grouped = grouped.rename(columns={0: 'customersCountPerGroup'})
print (grouped)Field  ClassLabel  customersCountPerGroup
0      1           4                       2
1      3           4                       3grouped['per'] = 100 * grouped.customersCountPerGroup / grouped.customersCountPerGroup.sum()
print (grouped)Field  ClassLabel  customersCountPerGroup   per
0      1           4                       2  40.0
1      3           4                       3  60.0

EDIT by comment:

mydf = pd.DataFrame({'Field':[1,1,3,3,3,4,5,6],'ClassLabel':[0,0,0,1,1,0,0,6],'A':[7,8,9,5,7,5,6,4]})print (mydf)grouped = mydf.groupby(['Field', 'ClassLabel']).size()
df =  grouped / grouped.sum()df = (grouped / df.groupby(level=0).transform('sum')).reset_index(name='new')
print (df)Field  ClassLabel       new
0      1           0  8.000000
1      3           0  2.666667
2      3           1  5.333333
3      4           0  8.000000
4      5           0  8.000000
5      6           6  8.000000
https://en.xdnf.cn/q/119686.html

Related Q&A

How to deal with large json files (flattening it to tsv) [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.Want to improve this question? Add details and clarify the problem by editing this post.Closed 3 years ago.Improve…

How can I find max number among numbers in this code?

class student(object):def student(self):self.name=input("enter name:")self.stno=int(input("enter stno:"))self.score=int(input("enter score:"))def dis(self):print("nam…

Assert data type of the values of a dict when they are in a list

How can I assert the values of my dict when they are in a list My_dict = {chr7: [127479365, 127480532], chr8: [127474697, 127475864], chr9: [127480532, 127481699]}The code to assert this assert all(isi…

Loading tiff images in fiftyone using ipynp

I am trying to load tiff images using fiftyone and python in ipynb notebook, but it just doesnt work. Anyone knows how to do it?

Regular expression to match the word but not the word inside other strings

I have a rich text like Sample text for testing:<a href="http://www.baidu.com" title="leoshi">leoshi</a>leoshi for details balala... Welcome to RegExr v2.1 by gskinner.c…

Make one image out of avatar and frame in Python with Pillow

If I haveandneed to getdef create_avatar(username):avatar, frame, avatar_id = get_avatar(username)if avatar is not None and frame is not None:try:image = Image.new("RGBA", size)image.putalpha…

Could not broadcast input array from shape (1285) into shape (1285, 5334)

Im trying to follow some example code provided in the documentation for np.linalg.svd in order to compare term and document similarities following an SVD on a TDM matrix. Heres what Ive got:results_t =…

Python URL Stepping Returns Only First Page Results

Any help with the below code would be appreciated. I have checked the results of h and g using print to verify that they are incrementing the url properly, but the program seems to be only repeating th…

Text processing to find co-occurences of strings

I need to process a series of space separated strings i.e. text sentences. ‘Co-occurrence’ is when two tags (or words) appear on the same sentence. I need to list all the co-occurring words when they…

Flask doesnt render any image [duplicate]

This question already has answers here:How to serve static files in Flask(24 answers)Link to Flask static files with url_for(2 answers)Closed 6 years ago.I have a flask application where I need to rend…