Pandas merge and grouby

2024/9/20 2:46:49

I have 2 pandas dataframes which looks like below.

Data Frame 1:  Section Chainage    Frame  
R125R002    10.133  1  
R125R002    10.138  2  
R125R002    10.143  3  
R125R002    10.148  4  
R125R002    10.153  5  Data Frame 2:Section Chainage    1   2   3   4   5   6   7   8   
R125R002    10.133  0   0   1   0   0   0   0   0     
R125R002    10.134  0   0   1   0   0   0   0   0     
R125R002    10.135  0   0   1   0   0   0   0   0     
R125R002    10.136  0   0   1   0   0   0   0   0     
R125R002    10.137  0   0   1   0   0   0   0   0     
R125R002    10.138  0   0   1   0   0   0   0   0     
R125R002    10.139  0   0   1   0   0   0   0   0     
R125R002    10.14   0   0   1   0   0   0   0   0     
R125R002    10.141  0   0   1   0   0   0   0   0     
R125R002    10.142  0   0   1   0   0   0   0   0     
R125R002    10.143  0   0   1   0   0   0   0   0     
R125R002    10.144  0   0   1   0   0   0   0   0     
R125R002    10.145  0   0   1   0   0   0   0   0     
R125R002    10.146  0   0   1   0   0   0   0   0     
R125R002    10.147  0   0   1   0   0   0   0   0     
R125R002    10.148  0   0   1   0   0   0   0   0     
R125R002    10.149  0   0   1   0   0   0   0   0     
R125R002    10.15   0   0   1   0   0   0   0   0     
R125R002    10.151  0   0   1   0   0   0   0   0     
R125R002    10.152  0   0   1   0   0   0   0   0     
R125R002    10.153  0   0   1   0   0   0   0   0   

required output dataframe:

Section Chainage Frame  1   2   3   4   5   6   7   8   
R125R002    10.133  1   0   0   1   0   0   0   0   0     
R125R002    10.138  2   0   0   1   0   0   0   0   0     
R125R002    10.143  3   0   0   1   0   0   0   0   0     
R125R002    10.148  4   0   0   1   0   0   0   0   0     
R125R002    10.153  5   0   0   1   0   0   0   0   0     

Dataframe 2 has increment of 1 m interval while dataframe 1 has increment of 5 m. I would like merge dataframe 2 to dataframe 1 and apply group by. Groupby for column 1 is sum, column 2 max, colum3 to 8 average.

In sql, I would link section between between 2 frames and apply between condition for the chainage and then add groupby.
Is there any way to achieve this in pandas.

Answer

You can first aggregate by each 5 rows with define functions in dictionary:

d = {'Section':'first','Chainage':'first','1':'sum','2':'max', '8':'mean'}
df22 = df2.groupby([np.arange(len(df2.index)) // 5], as_index=False).agg(d)
print (df22)Section  Chainage  1  2  8
0  R125R002    10.133  0  0  0
1  R125R002    10.138  0  0  0
2  R125R002    10.143  0  0  0
3  R125R002    10.148  0  0  0
4  R125R002    10.153  0  0  0

Detail:

print (np.arange(len(df2.index)) // 5)
[0 0 0 0 0 1 1 1 1 1 2 2 2 2 2 3 3 3 3 3 4]

And then need merge:

df = df1.merge(df22, on=['Section','Chainage'])
print (df)Section  Chainage  Frame  1  2  8
0  R125R002    10.133      1  0  0  0
1  R125R002    10.138      2  0  0  0
2  R125R002    10.143      3  0  0  0
3  R125R002    10.148      4  0  0  0
4  R125R002    10.153      5  0  0  0
https://en.xdnf.cn/q/119410.html

Related Q&A

Find a pattern in the line of another file in python [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.Want to improve this question? Update the question so it focuses on one problem only by editing this post.Closed 3…

AssertionError if running code in Python prompt but not if running as file

Why trying to explain here on stackoverflow what the Python command id() does and how can it be used to reveal how Python works under the hood I had run into following strange behavior I am struggling …

Remove values before and after special character

I have a dataframe, df, where I would like to remove the values that come before the underscore _ and after the underscore _ , essentially, keeping the middle. Also keeping the digits at the end and co…

Python selection sort

Question: The code is supposed to take a file (that contains one integer value per line), print the (unsorted) integer values, sort them, and then print the sorted values.Is there anything that doesnt…

Simple inheritance issue with Django templates

just getting started in Django, and I have some problems with the inheritances. It just seems that the loop for doesnt work when inheriting other template. Heres my code in base.html:<!DOCTYPE html&…

Replacing values in a list [closed]

Closed. This question is not reproducible or was caused by typos. It is not currently accepting answers.This question was caused by a typo or a problem that can no longer be reproduced. While similar q…

Azure Release Pipeline - Environment variables on python script

Lately Ive been requested to run a python script on my Azure Release Pipeline. This script needs some environment variables for being executed, as Ive seen that in the build pipeline, the task include …

Problem with python prepared stmt parameter passing

File C:\Users\User\AppData\Local\Programs\Python\Python37\lib\site-packages\mysql\connector\cursor.py, line 1149, in execute elif len(self._prepared[parameters]) != len(params): TypeError: object of ty…

list of lists to list of tuples without loops or list comprehensions [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.Want to improve this question? Add details and clarify the problem by editing this post.Closed 6 years ago.Improve…

How can I merge CSV rows that have the same value in the first cell?

This is the file: https://drive.google.com/file/d/0B5v-nJeoVouHc25wTGdqaDV1WW8/view?usp=sharingAs you can see, there are duplicates in the first column, but if I were to combine the duplicate rows, no…