Question 1

I have 2 dataframes left_df and right-df, which both have 20 columns with identical names and dtypes. right_df also has 2 additional columns with unique values on every row.

I want to update rows in right_df with ALL the values from left_df where the values in ALL columns in a list of a subset of columns, matching_cols = ['col_1', 'col_3', 'col_10', 'col_12'] are identical in both dataframes. The values in the additional 2 unique columns in right_df should be preserved.

Ideally, I want to also drop those rows from left_df in the same command, or as the next command if this isn't possible. I need to do this process more than once, matching on several different lists of columns, with the left_df dropping matched rows each loop, until eventually no further matches are found.

An acceptable alternative would be any method to create a new dataframe new_df containing the set of rows where all specified columns in the list matching_cols match, with values from left_df in the first 20 columns and values from right_df in the remaining 2 columns.

I don't care about preserving the indices at any point in either dataframe, I am importing them to SQL after this and will reindex them on one of the 2 right_df values at the end.

New to Pandas and can't determine what method to use, have tried variations of .merge, .join, .update, etc, but can't seem to specify to only update when my desired column values all match, or how to drop those rows/export them to a new df.

Update: Added pseudocode below:

For a left_df as:

left_df = pd.DataFrame({'col_0': ['0', '1', '2', '3', '4', '5'],'col_1': ['A', 'B', 'C', 'D', 'E', 'F'],'col_2': ['new', 'new', 'new', 'new', 'new', 'new'],'col_3': ['new', 'new', 'new', 'new', 'new', 'new'],'col_4': ['new', 'new', 'new', 'new', 'new', 'new'],'col_5': ['new', 'new', 'new', 'new', 'new', 'new'],'col_6': ['new', 'new', 'new', 'new', 'new', 'new'],'col_7': ['new', 'new', 'new', 'new', 'new', 'new'],                })

and a right_df as:

right_df = pd.DataFrame({'col_0': ['0', '1', '2', '3', '4', '5'],'col_1': ['A', 'B', 'C', 'X', 'E', 'F'],'col_2': ['old', 'old', 'old', 'old', 'old', 'old'],'col_3': ['old', 'old', 'old', 'old', 'old', 'old'],'col_4': ['old', 'old', 'old', 'old', 'old', 'old'],'col_5': ['old', 'old', 'old', 'old', 'old', 'old'],'col_6': ['old', 'old', 'old', 'old', 'old', 'old'],'col_7': ['old', 'old', 'old', 'old', 'old', 'old'],'col_8': ['uid_0', 'uid_1', 'uid_2', 'uid_3', 'uid_4', 'uid_5'],'col_9': ['uid_a', 'uid_b', 'uid_c', 'uid_d', 'uid_e', 'uid_f'],                })

Where matching_cols = ['col_0', 'col_1']

I want to get the following result either as a new dataframe or in-place on right_df (note that col_1 doesn't match on row 3, so is not changed)

  col_0 col_1 col_2 col_3 col_4 col_5 col_6 col_7  col_8  col_9
0     0     A   new   new   new   new   new   new  uid_0  uid_a
1     1     B   new   new   new   new   new   new  uid_1  uid_b
2     2     C   new   new   new   new   new   new  uid_2  uid_c
3     3     X   old   old   old   old   old   old  uid_3  uid_d
4     4     E   new   new   new   new   new   new  uid_4  uid_e
5     5     F   new   new   new   new   new   new  uid_5  uid_f

Question 2

Worked it out thanks to this post and the Pandas documentation:

First, it's a .merge I need, and I specify the suffixes as '_r' for only the columns to be copied from the right_df / for the old values I'm updating:

merged_df = pd.merge(left_df, right_df, on=['col_0', 'col_1'], suffixes=(None, '_r'))

This yields a new dataframe with rows containing both the new and old columns, only for rows in each dataframe where the values in columns on=['col_0', 'col_1'] are a match. Then I drop the "old" columns by using a regex filter on the text '_r':

merged_df.drop(list(merged_df.filter(regex = '_r')), axis=1, inplace=True)

This yields a dataframe with only the "modified" rows and no unmodified rows, which is close enough for what I need.

  col_0 col_1 col_2 col_3 col_4 col_5 col_6 col_7  col_8  col_9
0     0     A   new   new   new   new   new   new  uid_0  uid_a
1     1     B   new   new   new   new   new   new  uid_1  uid_b
2     2     C   new   new   new   new   new   new  uid_2  uid_c
3     4     E   new   new   new   new   new   new  uid_4  uid_e
4     5     F   new   new   new   new   new   new  uid_5  uid_f

Pandas - Update/Merge 2 Dataframes based on multiple matching column values

Related Q&A

How do I fix scrapy Unsupported URL scheme error?

comparing two timeseries dataframes based on some conditions in pandas

Game of Chance in Python 3.x?

Count occurence of a word by ID in python

ModuleNotFoundError: No module named plyer in Python

Floating point to 16 bit Twos Complement Binary, Python

Flag the first non zero column value with 1 and rest 0 having multiple columns

How to split up data from a column in a csv file into two separate output csv files?

Discord.py spellcheck commands

Django Model Form doesnt seem to validate the BooleanField