Pandas groupby and sum total of group

2024/11/13 21:39:44

I have a Pandas DataFrame with customer refund reasons. It contains these example data rows:

    **case_type**       **claim_type**
1   service             service
2   service             service
3   chargeback          service
4   chargeback          local_charges
5   service             supplier_service
6   chargeback          service
7   chargeback          service
8   chargeback          service
9   chargeback          service
10  chargeback          service
11  service             service_not_used
12  service             service_not_used

I would like to compare the customer's reason with some sort of labeled reason. This is no problem, but I would also like to see the total number of records in a specific group (customer reason).

case_claim_type = df[["case_type", "claim_type"]]
case_claim_type.groupby(by=("case_type", "claim_type"))["case_type"].count()

Which gives me this output, for example:

**case_type**     **claim_type**                 
service           service                         2supplier_service                1service_not_used                2
chargeback        service                         6local_charges                   1

I would also like to have have the sum of the output per case_type. Something like:

**case_type**     **claim_type**                 
service           service                         2supplier_service                1service_not_used                2total:                          5
chargeback        service                         6local_charges                   1total:                          7

It doesn't necessarily has to be in this last output format, a column with the (aggregated) totals per case_type is also fine.

Answer

Where:

df = pd.DataFrame({'case_type':['Service']*20+['chargeback']*9,'claim_type':['service']*5+['local_charges']*5+['service_not_used']*5+['supplier_service']*5+['service']*8+['local_charges']})df_out = df.groupby(by=("case_type", "claim_type"))["case_type"].count()

Let use pd.concat, sum with level parameter, and assign:

(pd.concat([df_out.to_frame(),df_out.sum(level=0).to_frame().assign(claim_type= "total").set_index('claim_type', append=True)]).sort_index())

Output:

                             case_type
case_type  claim_type                 
Service    local_charges             5service                   5service_not_used          5supplier_service          5total                    20
chargeback local_charges             1service                   8total                     9
https://en.xdnf.cn/q/71993.html

Related Q&A

Capture webcam video using PyQt

Given the following PyQt code, I can perfectly capture the webcams streaming video. Now, I want to modify code, so a button named capture button is added that once pressed captures the streaming video …

Plot a 3d surface from a list of lists using matplotlib

Ive searched around for a bit, and whhile I can find many useful examples of meshgrid, none shhow clearly how I can get data from my list of lists into an acceptable form for any of the varied ways Ive…

Super fast way to compare if two strings are equal

Obviously, in Python to check whether two strings are equal you can do:"hello word" == "hello world"But what if you are comparing really long strings (in excess of 1m characters)? …

Pandas DataFrames in reportlab

I have a DataFrame, and want to output it to a pdf. Im currently trying to use ReportLab for this, but it wont seem to work. I get an error here:mytable = Table(make_pivot_table(data, pivot_cols, colum…

How to open and close a website using default browser with python

Im trying to write a python script on windows platform to open a webpage(such as Google), and then, after 10 seconds, close this website. Note: Im using Windows 7, Python 2.7.10, and IE

Comparing numpy array with itself by element efficiently

I am performing a large number of these calculations:A == A[np.newaxis].Twhere A is a dense numpy array which frequently has common values.For benchmarking purposes we can use:n = 30000 A = np.random.r…

Kivy: BoxLayout vs. GridLayout

BoxLayout(orientation=vertical) vs. GridLayout(cols=1):They both do the same thing, no? Is there a reason to choose one over the other?

Flask circular dependency

I am developing a Flask application. It is still relatively small. I had only one app.py file, but because I needed to do database migrations, I divided it into 3 using this guide:https://realpython.co…

How to create tox.ini variables

Is there a way to set arbitrary variables within tox.ini?An example would be a project name that might be used in a variety of ways. With a rather complex tox.ini, I find myself copy and pasting all …

How to apply json_normalize on entire pandas column

I have a dataframe with LISTS(with dicts) as column values . My intention is to normalize entire column(all rows). I found way to normalize a single row . However, Im unable to apply the same function …