How to pick the rows which contains all the keywords? [closed]

2024/10/5 18:27:56

I have 2 csv files as below :

File-1

procedure   code
anand database  321-87
shiva network   321-123
jana audit  321-56
kalai recruitment   321-10

in file-1, each word in a row is a key word.

File-2

s.no    procedure
1   kalai has a recruitment group
2   shiva is the network person in my office
3   he is the auditor in my office
4   anand is the database here
5   i bought a new phone this week
6   jana is working in the audit team

in the above scenario, i need to pick the row in file-2 which contains all the key words of each row in file-1. suppose for example, row-1 in file-1 contains 2 key words 'anand' & 'database'. i need to select the row in file-2 which contains both the keywords 'anand' & 'database'.

can anyone help me out in this?

Answer

If df is relatively small, you could use str.contains. First, build a pattern from df.

dfprocedure     code
0     anand database   321-87
1      shiva network  321-123
2         jana audit   321-56
3  kalai recruitment   321-10p = df.procedure.str.split().str.join('.*?').str.cat(sep='|')p
'anand.*?database|shiva.*?network|jana.*?audit|kalai.*?recruitment'

Now, pass it to str.contains on df2.procedure.

df2[df2.procedure.str.contains(p)]s.no                                 procedure
0     1             kalai has a recruitment group
1     2  shiva is the network person in my office
3     4                anand is the database here
5     6         jana is working in the audit team
https://en.xdnf.cn/q/119045.html

Related Q&A

Extract HTML Tables With Similar Data from Different Sources with Different Formatting - Python

I am trying to scrape HTML tables from two different HTML sources. Both are very similar, each table includes the same data but they may be structured differently, with different column names etc. For …

AttributeError: NoneType object has no attribute replace_with

I am getting the following error:Traceback (most recent call last):File "2.py", line 22, in <module>i.string.replace_with(i.string.replace(u\xa0, -)) AttributeError: NoneType object has…

How to expand out a Pyspark dataframe based on column?

How do I expand a dataframe based on column values? I intend to go from this dataframe:+---------+----------+----------+ |DEVICE_ID| MIN_DATE| MAX_DATE| +---------+----------+----------+ | 1|…

How can I trigger my python script to automatically run via a ping?

I wrote a script that recurses through a set of Cisco routers in a network, and gets traffic statistics. On the router itself, I have it ping to the loopback address of my host PC, after a traffic thre…

How do I make my bot delete a message when it contains a certain word?

Okay so Im trying to make a filter for my bot, but one that isnt too complicated. Ive got this:@bot.event async def on_message(ctx,message):if fuck in Message.content.lower:Message.delete()But it gives…

pyinstaller cant find package Tix

I am trying to create an executable with pyinstaller for a python script with tix from tkinter. The following script also demonstrates the error: from tkinter import * from tkinter import tixroot = ti…

form.validate_on_submit() doesnt work(nothing happen when I submit a form)

Im creating a posting blog function for social media website and Im stuck on a problem: when I click on the "Post" button(on create_post.html), nothing happens.In my blog_posts/views.py, when…

How to find determinant of matrix using python

New at python and rusty on linear Algebra. However, I am looking for guidance on the correct way to create a determinant from a matrix in python without using Numpy. Please see the snippet of code belo…

How do I pass variables around in Python?

I want to make a text-based fighting game, but in order to do so I need to use several functions and pass values around such as damage, weapons, and health.Please allow this code to be able to pass &qu…

How to compare an item within a list of list in python

I am a newbie to python and just learning things as I do my project and here I have a list of lists which I need to compare between the second and last column and get the output for the one which has t…