python: use agg with more than one customized function

2024/9/20 12:40:58

I have a data frame like this.

mydf = pd.DataFrame({'a':[1,1,3,3],'b':[np.nan,2,3,6],'c':[1,3,3,9]})a    b  c
0  1  NaN  1
1  1  2.0  3
2  3  3.0  3
3  3  6.0  9

I would like to have a resulting dataframe like this.

myResults = pd.concat([mydf.groupby('a').apply(lambda x: (x.b/x.c).max()), mydf.groupby('a').apply(lambda x: (x.b/x.c).min())], axis =1)
myResults.columns = ['max','min']max       min
a
1  0.666667  0.666667
3  1.000000  0.666667

Basically i would like to have max and min of ratio of column b and column c for each group (grouped by column a)

If it possible to achieve this by agg? I tried mydf.groupby('a').agg([lambda x: (x.b/x.c).max(), lambda x: (x.b/x.c).min()]). It will not work, and seems column name b and c will not be recognized.

Another way i can think of is to add the ratio column first to mydf. i.e. mydf['ratio'] = mydf.b/mydf.c, and then use agg on the updated mydf like mydf.groupby('a')['ratio'],agg[max,min].

Is there a better way to achieve this through agg or other function? In summary, I would like to apply customized function to grouped DataFrame, and the customized function needs to read multiple columns from original DataFrame.

Answer

You can use a customized function to acheive this.

You can create any number of new columns using any input columns using the below function.

def f(x):t = {}t['max'] = (x['b']/x['c']).max()t['min'] = (x['b']/x['c']).min()return pd.Series(t)mydf.groupby('a').apply(f)

Output:

        max       min
a                    
1  0.666667  0.666667
3  1.000000  0.666667
https://en.xdnf.cn/q/119223.html

Related Q&A

sending multiple images using socket python get sent as one to client

I am capturing screenshots from the server, then sending it to the client, but the images get all sent as one big file to the client that keeps expanding in size. This only happens when i send from one…

What are the different methods to retrieve elements in a pandas Series?

There are at least 4 ways to retrieve elements in a pandas Series: .iloc, .loc .ix and using directly the [] operator.Whats the difference between them ? How do they handle missing labels/out of range…

Speaker recognition - Bad Request error on microsoft oxford

I am using the python wrapper that has been given in the SDK section. Ive been trying to enroll a voice file for a created profile using the python API.I was able to create a profile and list all profi…

Remove list of phrases from string

I have an array of phrases: bannedWords = [hi, hi you, hello, and you]I want to take a sentence like "hi, how are tim and you doing" and get this:", how are tim doing"Exact case mat…

ImportError: No module named... (basics?) [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.Want to improve this question? Add details and clarify the problem by editing this post.Closed 10 years ago.Improv…

How to pass python list address

I want to convert c++ code to python. I have created a python module using SWIG to access c++ classes.Now I want to pass the following c++ code to PythonC++#define LEVEL 3double thre[LEVEL] = { 1.0l, 1…

Python open says file doesnt exist when it does

I am trying to work out why my Python open call says a file doesnt exist when it does. If I enter the exact same file url in a browser the photo appears.The error message I get is:No such file or direc…

How to map one list and dictionary in python [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.Want to improve this question? Add details and clarify the problem by editing this post.Closed 4 years ago.Improve…

Sorting date inside of files (Python)

I have a txt file with names and dates like this name0 - 05/09/2020 name1 - 14/10/2020 name2 - 02/11/2020 How can I sort the text file by date? so that the file will end up like this name2 - 02/11/202…

Dicing in python

Code:-df = pd.DataFrame({col1:t, col2:wordList}) df.columns=[DNT,tweets] df.DNT = pd.to_datetime(df.DNT, errors=coerce) check=df[ (df.DNT < 09:20:00) & (df.DNT > 09:00:00) ]Dont know why this…