I am wondering if there a fast way to merge two pandas tables by the regular expression in python .
For example:
table A
col1 col2
1 apple_3dollars_5
2 apple_2dollar_4
1 orange_5dollar_3
1 apple_1dollar_3
table B
col1 col2
good (apple|oragne)_\dollars_5
bad .*_1dollar_.*
ok oragne_\ddollar_\d
Output:
col1 col2 col3
1 apple_3dollars_5 good
1 orange_5dollar_3 ok
1 apple_1dollar_3 bad
this is just an example, what I want is instead of merging by one col that exactly match, I want to join by some regular expression. Thank you!
First of all fix RegEx'es in the B
DataFrame:
In [222]: B
Out[222]:col1 col2
0 good (apple|oragne)_\ddollars_5
1 bad .*_1dollar_.*
2 ok orange_\ddollar_\d
Now we can prepare the following variables:
In [223]: to_repl = B.col2.values.tolist()In [224]: vals = B.col1.values.tolist()In [225]: to_repl
Out[225]: ['(apple|oragne)_\\ddollars_5', '.*_1dollar_.*', 'orange_\\ddollar_\\d']In [226]: vals
Out[226]: ['good', 'bad', 'ok']
Finally we can use them in the replace function:
In [227]: A['col3'] = A['col2'].replace(to_repl, vals, regex=True)In [228]: A
Out[228]:col1 col2 col3
0 1 apple_3dollars_5 good
1 2 apple_2dollar_4 apple_2dollar_4
2 1 orange_5dollar_3 ok
3 1 apple_1dollar_3 bad