Python Pandas average based on condition into new column

2024/11/13 12:12:38

I have a pandas dataframe containing the following data:

matchID    server    court    speed
1          1         A         100
1          2         D         200
1          3         D         300
1          4         A         100
1          1         A         120
1          2         A         250
1          3         D         110
1          4         D         100
2          1         A         100
2          2         D         200
2          3         D         300
2          4         A         100
2          1         A         120
2          2         A         250
2          3         D         110
2          4         D         100

I would like to add two new columns containing the mean based on two conditions. The column meanSpeedCourtA13 shall contain the mean speed of servers 1 and 3 where court = A. This would bee (100 + 120) / 2 = 110. The second column named meanSpeedCourtD13 shall contain the mean speed of servers 1 and 3 where court = D. This would be (300 + 110) / 2 = 205.

Please note that this should be done for each matchID, hence, a groupby is also required. this means that solutions containing iloc() cannot be used.

The resulting dataframe should look as follows:

matchID    server    court     speed    meanSpeedCourtA13   meanSpeedCourtD13
1          1         A         100      110                 205
1          2         D         200      110                 205
1          3         D         300      110                 205
1          4         A         100      110                 205
1          1         A         120      110                 205
1          2         A         250      110                 205
1          3         D         110      110                 205
1          4         D         100      110                 205
2          1         A         100      110                 205        
2          2         D         200      110                 205        
2          3         D         300      110                 205        
2          4         A         100      110                 205        
2          1         A         120      110                 205        
2          2         A         250      110                 205        
2          3         D         110      110                 205        
2          4         D         100      110                 205        
Answer

Ok this got a bit more complicated. Normally I'd try something with transform but I'd be glad if someone had something better than the following:

Use groupby and send df to func where df.loc is used, lastly use pd.concat to glue the dataframe together again:

import pandas as pddata = {'matchID': {0: 1, 1: 1, 2: 1, 3: 1, 4: 1, 5: 1, 6: 1, 7: 1, 8: 2, 9: 2, 10: 2, 11: 2, 12: 2, 13: 2, 14: 2, 15: 2}, 
'court': {0: 'A', 1: 'D', 2: 'D', 3: 'A', 4: 'A', 5: 'A', 6: 'D', 7: 'D', 8: 'A',9: 'D', 10: 'D', 11: 'A', 12: 'A', 13: 'A', 14: 'D', 15: 'D'}, 
'speed': {0: 100, 1: 200, 2: 300, 3: 100, 4: 120, 5: 250, 6: 110, 7: 100, 8: 100, 9: 200, 10: 300, 11: 100, 12: 120, 13: 250, 14: 110, 15: 100}, 
'server': {0: 1, 1: 2, 2: 3, 3: 4, 4: 1, 5: 2, 6: 3, 7: 4, 8: 1, 9: 2, 10: 3, 11: 4, 12: 1, 13: 2, 14: 3, 15: 4}}df = pd.DataFrame(data)def func(dfx):dfx['meanSpeedCourtA13'],dfx['meanSpeedCourtD13'] = \(dfx.loc[(dfx.server.isin((1,3))) & (dfx.court == 'A'),'speed'].mean(),dfx.loc[(dfx.server.isin((1,3))) & (dfx.court == 'D'),'speed'].mean())return dfxnewdf = pd.concat(func(dfx) for _, dfx in df.groupby('matchID'))print(newdf)

Returns

   court  matchID  server  speed  meanSpeedCourtA13  meanSpeedCourtD13
0      A        1       1    100             110.00             205.00
1      D        1       2    200             110.00             205.00
2      D        1       3    300             110.00             205.00
3      A        1       4    100             110.00             205.00
4      A        1       1    120             110.00             205.00
5      A        1       2    250             110.00             205.00
6      D        1       3    110             110.00             205.00
7      D        1       4    100             110.00             205.00
8      A        2       1    100             110.00             205.00
9      D        2       2    200             110.00             205.00
10     D        2       3    300             110.00             205.00
11     A        2       4    100             110.00             205.00
12     A        2       1    120             110.00             205.00
13     A        2       2    250             110.00             205.00
14     D        2       3    110             110.00             205.00
15     D        2       4    100             110.00             205.00
https://en.xdnf.cn/q/72296.html

Related Q&A

Merging same-indexed rows by taking non-NaNs from all of them in pandas dataframe

I have a sparse dataframe with duplicate indices. How can I merge the same-indexed rows in a way that I keep all the non-NaN data from the conflicting rows?I know that you can achieve something very c…

Approximating cos using the Taylor series

Im using the Taylors series to calculate the cos of a number, with small numbers the function returns accurate results for example cos(5) gives 0.28366218546322663. But with larger numbers it returns i…

How to apply max min boundaries to a value without using conditional statements

Problem:Write a Python function, clip(lo, x, hi) that returns lo if x is less than lo; hi if x is greater than hi; and x otherwise. For this problem, you can assume that lo < hi.Dont use any conditi…

pandas to_json() redundant backslashes

I have a .csv file containing data about movies and Im trying to reformat it as a JSON file to use it in MongoDB. So I loaded that csv file to a pandas DataFrame and then used to_json method to write i…

How can I get the old zip() in Python3?

I migrated from Python 2.7 to Python 3.3 and zip() does not work as expected anymore. Indeed, I read in the doc that it now returns an iterator instead of a list.So, how I am supposed to deal with this…

How can I use tensorflow metric function within keras models?

using python 3.5.2 tensorflow rc 1.1Im trying to use a tensorflow metric function in keras. the required function interface seems to be the same, but calling:import pandas import numpy import tensorflo…

Pandas return the next Sunday for every row

In Pandas for Python, I have a data set that has a column of datetimes in it. I need to create a new column that has the date of the following Sunday for each row. Ive tried various methods trying to u…

Where is `_softmax_cross_entropy_with_logits` defined in tensorflow?

I am trying to see how softmax_cross_entropy_with_logits_v2() is implemented. It calls _softmax_cross_entropy_with_logits(). But I dont see where the latter is defined. Does anybody know how to locate …

Python: Counting frequency of pairs of elements in a list of lists

Actually, I have a dataset about a "meeting". For example, A,B,C have a meeting, then the list would be [A,B,C]. Like this, each list would contain a list of members who participated in the …

How to create a pandas dataframe where columns are filled with random strings?

I want to create a Pandas dataframe with 2 columns and x number rows that contain random strings. I have found code to generate a pandas dataframe with random ints and a random stringer generator. I st…