Python Pandas: Merging data frames on multiple conditions

2024/10/13 7:22:55

I wish to merge data frames as fetched via sql under multiple condition.

  • df1: First df contains Customer ID, Cluster ID and Customer Zone ID.
  • The second df contain complain ID, registration number.

The df1 and df2 are shown below:

df1

Customer ID     Cluster ID  Customer Zone ID
CUS1001.A       CUS1001.X   CUS1000
CUS1001.B       CUS1001.X   CUS1000
CUS1001.C       CUS1001.X   CUS1000
CUS1001.D       CUS1001.X   CUS1000
CUS1001.E       CUS1001.X   CUS1000
CUS2001.A       CUS2001.X   CUS2000

df2:

Complain ID RegistrationNumber   Status
CUS3501.A       99231            open
CUS1001.B       21340            open
CUS1001.X       32100            open

I wish to merge these two data frame with following condition:

if(Complain ID == Customer ID):Merge on Customer ID
Elif(Complain ID == Cluster ID):Merge on Customer ID
Elif (Complain ID == Customer Zone ID):Merge on Customer ID
Else:Merge empty row.

Final result should look like this:

Customer ID Cluster ID  Customer Zone ID   Complain ID  Regi ID  Status
CUS1001.A   CUS1001.X       CUS1000         CUS1001.X    32100    open
CUS1001.B   CUS1001.X       CUS1000         CUS1001.B    21340    open
CUS1001.C   CUS1001.X       CUS1000         CUS1001.X    32100    open.             .               .               .           .       ..             .               .               .           .       .
CUS2001.A   CUS2001.X       CUS2000             0           0       0

Please help!

Answer

Try this ...using pandas: melt, merge and concat

df=pd.melt(df1)
df=df.merge(df2,left_on='value',right_on='Complain ID',how='left')
df['number']=df.groupby('variable').cumcount()
df=df.groupby('number').bfill()
Target=pd.concat([df1,df.iloc[:5,2:6]],axis=1).fillna(0).drop('number',axis=1)Target
Out[39]: Customer ID Cluster ID Customer Zone ID Complain ID  RegistrationNumber  \
0   CUS1001.A  CUS1001.X          CUS1000   CUS1001.X             32100.0   
1   CUS1001.B  CUS1001.X          CUS1000   CUS1001.B             21340.0   
2   CUS1001.C  CUS1001.X          CUS1000   CUS1001.X             32100.0   
3   CUS1001.D  CUS1001.X          CUS1000   CUS1001.X             32100.0   
4   CUS1001.E  CUS1001.X          CUS1000   CUS1001.X             32100.0   
5   CUS2001.A  CUS2001.X          CUS2000           0                 0.0   Status    
0   open         
1   open         
2   open         
3   open         
4   open        
5      0         

Update By using numpy's intersect1d, Personally I like this approach most than the previous one .

df1.MatchId=[np.intersect1d(x,df2.ComplainID.values) for x in df1[['CustomerID','ClusterID']].values]
df1.MatchId=df1.MatchId.apply(pd.Series)
df1
Out[307]:CustomerID  ClusterID CustomerZoneID    MatchId
0  CUS1001.A  CUS1001.X        CUS1000  CUS1001.X
1  CUS1001.B  CUS1001.X        CUS1000  CUS1001.B
2  CUS1001.C  CUS1001.X        CUS1000  CUS1001.X
3  CUS1001.D  CUS1001.X        CUS1000  CUS1001.X
4  CUS1001.E  CUS1001.X        CUS1000  CUS1001.X
5  CUS2001.A  CUS2001.X        CUS2000        NaNdf1.merge(df2,left_on='MatchId',right_on='ComplainID',how='left')
Out[311]: CustomerID  ClusterID CustomerZoneID    MatchId ComplainID  \
0  CUS1001.A  CUS1001.X        CUS1000  CUS1001.X  CUS1001.X   
1  CUS1001.B  CUS1001.X        CUS1000  CUS1001.B  CUS1001.B   
2  CUS1001.C  CUS1001.X        CUS1000  CUS1001.X  CUS1001.X   
3  CUS1001.D  CUS1001.X        CUS1000  CUS1001.X  CUS1001.X   
4  CUS1001.E  CUS1001.X        CUS1000  CUS1001.X  CUS1001.X   
5  CUS2001.A  CUS2001.X        CUS2000        NaN        NaN   RegistrationNumber Status  
0             32100.0   open  
1             21340.0   open  
2             32100.0   open  
3             32100.0   open  
4             32100.0   open  
5                 NaN    NaN  
https://en.xdnf.cn/q/118106.html

Related Q&A

counterpart to PILs Image.paste in PHP

I was asked to port a Python application to PHP (and Im not very fond of PHP).The part Im having trouble to port uses a set of monochromatic "template" images based on the wonderful Map Icons…

Google Cloud Dataflow fails in combine function due to worker losing contact

My Dataflow consistently fails in my combine function with no errors reported in the logs beyond a single entry of:A work item was attempted 4 times without success. Each time the worker eventually los…

AttributeError: super object has no attribute __getattr__

Ive been searching for the solution of this problem over the all internet but I still cant find the right solution. There are lots of generic answers but none of those have solved my problem..I am tryi…

Selenium load time errors - looking for possible workaround

I am trying to data scrape from a certain website. I am using Selenium so that I can log myself in, and then start parsing through data. I have 3 main errors:Last page # not loading properly. here I am…

How to POST ndb.StructuredProperty?

Problem:I have following EndpointsModels,class Role(EndpointsModel):label = ndb.StringProperty()level = ndb.IntegerProperty()class Application(EndpointsModel):created = ndb.DateTimeProperty(auto_now_ad…

Issue computing difference between two csv files

Im trying to obtain the difference between two csv files A.csv and B.csv in order to obtain new rows added in the second file. A.csv has the following data.acct ABC 88888888 99999999 ABC-G…

How do I display an extremly long image in Tkinter? (how to get around canvas max limit)

Ive tried multiple ways of displaying large images with tkinterreally long image No matter what Ive tried, there doesnt seem to be any code that works. The main issue is that Canvas has a maximum heigh…

NoneType has no attribute IF-ELSE solution

Im parsing an HTML file and searching for status of order in it. Sometimes, status doesnt exist, so BeautifulSoup returns NoneType, when Im using it. To solve this problem I use if-else statement, but …

looking for an inverted heap in python

Id like to comb off the n largest extremes from a timeseries. heapq works perfectly for the nlargestdef nlargest(series, n):count = 0heap = []for e in series:if count < n:count+=1hp.heappush(heap, e…

Concatenating Multiple DataFrames with Non-Standard Columns

Is there a good way to concatenate a list of DataFrames where the columns are not regular between DataFrames? The desired outcome is to match up all columns that are a match but to keep the ones that …