Creating a list of keywords by scrolling through a dataframe (python)

2024/10/6 16:20:34

I have a dataframe that looks like this:

dataFrame = pd.DataFrame({'Name': (("' Verbundmörtel ', ' Compound Mortar ', ' Malta per stucchi e per incollaggio '"),("' StoLevell In Absolute ', ' StoLevell In Absolute '"),("' Anhydrit-FlieÃ\x9festrich ', ' Anhydrite Flowing Screed ', ' Massetto a base di anidrite '"),("' Ansetzmörtel SLP ', ' Attachment mortar SLP ', ' Malta minerale adesiva SLP + iQ-Fix '"),("' AQUAPANEL Cement Mörtel ', ' AQUAPANEL Cement Mortar '"),("' Armatop por ', ' Armatop por '"),("' Armatop por ', ' Armatop por '")),"File_name":(( "esiveCoveringPlaster_2" ),("AdhesiveMortarLevellInForAEVERO_720"),("AnhydriteFlowingScreed_20"),("AnsetzmoertelSLPRemmers_21"),("AquaboardMoertel_655"),("ArmatopPor479korr_797"),("ArmatopPor_479"))})

And the keywords I am searching for:

words = ['Mortar','hist','lime','loam','adhesive','clay','cement','insulation','sealing','light','base', 'glue', 'gyps', 'mineral', 'fine','Levelling', 'mould','Silicate''Porous','Concrete','screed','Rendering', 'Silicate','Renovation''Perlite','Waterproof','Porous','Old', 'Inside', 'por']

I would like to obtain a list of keywords. I am trying two methods but am not getting the desired result

METHOD 1

test = ((dataFrame['Name'] + dataFrame['File_name'])).str.findall('|'.join(words),flags=re.IGNORECASE).map(','.join)

RESULT 1

0   Mortar
1   Adhesive,Mortar
2   Screed,base,Screed
3   mortar,mineral
4   Cement,Cement,Mortar
5   por,por,Por
6   por,por,Por

METHOD 2

test = pd.concat([(dataFrame['Name'] + dataFrame['File_name']).str.contains(word, case=False).map({True: word, False: ''})for word in words], axis=1).agg(list, axis=1).str.join(',').str.strip(',')

RESULT 2

0   Mortar
1   Mortar,,,,adhesive
2   base,,,,,,,,,screed
3   Mortar,,,,,,,,,,,,,mineral
4   Mortar,,,,,,cement
5   por
6   por

My goal is to find the words in the two columns. The new column will then be added to the dataframe. I expect a list of words in the results:

words = [['Mortar'],['Mortar', 'adhesive'],['Base', 'screed'],['Mortar', 'mineral'],['Mortar', 'cement'],['por'],['por']]

I am creating scatterplots and the function "hue" will have to refer to the second column. I hope I have made myself clear enough.

Answer

I assume that you only want to find the occurrence of the words in your list words in your dataframe. Your exact problem is a bit unclear.

words = ['Mortar','hist','lime','loam','adhesive','clay','cement','insulation','sealing','light','base', 'glue', 'gyps', 'mineral', 'fine','Levelling', 'mould','Silicate''Porous','Concrete','screed','Rendering', 'Silicate','Renovation''Perlite','Waterproof','Porous','Old', 'Inside', 'por']
pattern = "|".join(words)regexW = re.compile(pattern)
regexW.findall("".join(str(df.values)))['Mortar', 'Mortar', 'base', 'mineral', 'Mortar', 'por', 'por', 'por', 'por']
https://en.xdnf.cn/q/118933.html

Related Q&A

How to click this button with python selenium

Im looking to click the button highlighted in the screenshot below; have tried with pyautogui but found results to be inconsistent so trying selenium instead.Im having trouble identifying the button to…

How to find the average of numbers being input, with 0 breaking the loop?

I just need to figure out how to find the average of all these input numbers by the user while using 0 as a exit of the loop. I need to figure out how to eliminate using 0 as part of the average. examp…

NoSuchElementException when loading code using Selenium on Heroku

Error: ERROR:asyncio:Task exception was never retrieved2022-03-14T14:08:52.425684+00:00 app[worker.1]: future: <Task finished name=Task-30 coro=<Dispatcher._process_polling_updates() done, define…

Python alphanumeric

Problem:I have to go through text file that has lines of strings and determine about each line if it is alphanumeric or not. If the line is alphanumeric print for example "5345m34534l is alphanume…

Python 3 - exec() Vs eval() - Expression evaluation [duplicate]

This question already has answers here:Whats the difference between eval, exec, and compile?(3 answers)Closed 7 years ago.After reading query.below python code is still not clear,>>> exec(pri…

What is the syntax for printing multiple data types in Python? [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.Want to improve this question? Add details and clarify the problem by editing this post.Closed 8 years ago.Improve…

When scraping all the div to get the data getting the null list using lxml in python

I want to scrape the product title , product link , product price but when I am using the xpath it is showing the null list . How to add the xpath and for loop to get the above details . I have tried …

Python how to convert this for loop into a while loop [duplicate]

This question already has answers here:Closed 11 years ago.Possible Duplicate:Converting a for loop to a while loop I have this for a for loop which I made I was wondering how I would write so it woul…

Joining elements in Python list

Given a string, say s=135 and a list, say A=[1,2,3,4,5,6,7], how can I separate the values in the list that are also in s (a digit of s) from the other elements and concatenate these other elements. Th…

Python - Count letters in random strings

I have a bunch of integers which are allocated values using the random module, then converted to letters depending on their position of the alphabet.I then combine a random sample of these variables in…