How to merge pandas table by regex

2024/10/12 21:25:21

I am wondering if there a fast way to merge two pandas tables by the regular expression in python .

For example: table A

col1 col2             
1    apple_3dollars_5        
2    apple_2dollar_4
1    orange_5dollar_3
1    apple_1dollar_3

table B

col1 col2
good (apple|oragne)_\dollars_5
bad  .*_1dollar_.*
ok   oragne_\ddollar_\d

Output:

col1 col2              col3
1    apple_3dollars_5  good
1    orange_5dollar_3  ok
1    apple_1dollar_3   bad

this is just an example, what I want is instead of merging by one col that exactly match, I want to join by some regular expression. Thank you!

Answer

First of all fix RegEx'es in the B DataFrame:

In [222]: B
Out[222]:col1                        col2
0  good  (apple|oragne)_\ddollars_5
1   bad               .*_1dollar_.*
2    ok          orange_\ddollar_\d

Now we can prepare the following variables:

In [223]: to_repl = B.col2.values.tolist()In [224]: vals = B.col1.values.tolist()In [225]: to_repl
Out[225]: ['(apple|oragne)_\\ddollars_5', '.*_1dollar_.*', 'orange_\\ddollar_\\d']In [226]: vals
Out[226]: ['good', 'bad', 'ok']

Finally we can use them in the replace function:

In [227]: A['col3'] = A['col2'].replace(to_repl, vals, regex=True)In [228]: A
Out[228]:col1              col2             col3
0     1  apple_3dollars_5             good
1     2   apple_2dollar_4  apple_2dollar_4
2     1  orange_5dollar_3               ok
3     1   apple_1dollar_3              bad
https://en.xdnf.cn/q/69608.html

Related Q&A

Scipy Optimize is only returning x0, only completing one iteration

I am using scipy optimize to get the minimum value on the following function: def randomForest_b(a,b,c,d,e):return abs(rf_diff.predict([[a,b,c,d,e]]))I eventually want to be able to get the optimal val…

Order of sess.run([op1, op2...]) in Tensorflow

I wonder whats the running order of the op list in sess.run(ops_list, ...). for example:for a typical classification scenario: _, loss = sess.run([train_op, loss_op]), if train_op run first,then the lo…

Django form validation: get errors in JSON format

I have this very simple Django formfrom django import formsclass RegistrationForm(forms.Form):Username = forms.CharField()Password = forms.CharField()I manage this manually and dont use the template en…

Django inheritance and polymorphism with proxy models

Im working on a Django project that I did not start and I am facing a problem of inheritance. I have a big model (simplified in the example) called MyModel that is supposed to represents different kind…

L suffix in long integer in Python 3.x

In Python 2.x there was a L suffix after long integer. As Python 3 treats all integers as long integer this has been removed. From Whats New In Python 3.0:The repr() of a long integer doesn’t include …

Custom Colormap

I want to plot a heatmap with a custom colormap similar to this one, although not exactly.Id like to have a colormap that goes like this. In the interval [-0.6, 0.6] the color is light grey. Above 0.6,…

Whats the point of @staticmethod in Python?

Ive developed this short test/example code, in order to understand better how static methods work in Python.class TestClass:def __init__(self, size):self.size = sizedef instance(self):print("regul…

logical or on list of pandas masks

I have a list of boolean masks obtained by applying different search criteria to a dataframe. Here is an example list containing 4 masks: mask_list = [mask1, mask2, mask3, mask4]I would like to find th…

How to view the implementation of pythons built-in functions in pycharm?

When I try to view the built-in function all() in PyCharm, I could just see "pass" in the function body. How to view the actual implementation so that I could know what exactly the built-in f…

How to gracefully fallback to `NaN` value while reading integers from a CSV with Pandas?

While using read_csv with Pandas, if i want a given column to be converted to a type, a malformed value will interrupt the whole operation, without an indication about the offending value.For example, …