I have a dataframe that looks like this:
dataFrame = pd.DataFrame({'Name': (("' Verbundmörtel ', ' Compound Mortar ', ' Malta per stucchi e per incollaggio '"),("' StoLevell In Absolute ', ' StoLevell In Absolute '"),("' Anhydrit-FlieÃ\x9festrich ', ' Anhydrite Flowing Screed ', ' Massetto a base di anidrite '"),("' Ansetzmörtel SLP ', ' Attachment mortar SLP ', ' Malta minerale adesiva SLP + iQ-Fix '"),("' AQUAPANEL Cement Mörtel ', ' AQUAPANEL Cement Mortar '"),("' Armatop por ', ' Armatop por '"),("' Armatop por ', ' Armatop por '")),"File_name":(( "esiveCoveringPlaster_2" ),("AdhesiveMortarLevellInForAEVERO_720"),("AnhydriteFlowingScreed_20"),("AnsetzmoertelSLPRemmers_21"),("AquaboardMoertel_655"),("ArmatopPor479korr_797"),("ArmatopPor_479"))})
And the keywords I am searching for:
words = ['Mortar','hist','lime','loam','adhesive','clay','cement','insulation','sealing','light','base', 'glue', 'gyps', 'mineral', 'fine','Levelling', 'mould','Silicate''Porous','Concrete','screed','Rendering', 'Silicate','Renovation''Perlite','Waterproof','Porous','Old', 'Inside', 'por']
I would like to obtain a list of keywords. I am trying two methods but am not getting the desired result
METHOD 1
test = ((dataFrame['Name'] + dataFrame['File_name'])).str.findall('|'.join(words),flags=re.IGNORECASE).map(','.join)
RESULT 1
0 Mortar
1 Adhesive,Mortar
2 Screed,base,Screed
3 mortar,mineral
4 Cement,Cement,Mortar
5 por,por,Por
6 por,por,Por
METHOD 2
test = pd.concat([(dataFrame['Name'] + dataFrame['File_name']).str.contains(word, case=False).map({True: word, False: ''})for word in words], axis=1).agg(list, axis=1).str.join(',').str.strip(',')
RESULT 2
0 Mortar
1 Mortar,,,,adhesive
2 base,,,,,,,,,screed
3 Mortar,,,,,,,,,,,,,mineral
4 Mortar,,,,,,cement
5 por
6 por
My goal is to find the words in the two columns. The new column will then be added to the dataframe. I expect a list of words in the results:
words = [['Mortar'],['Mortar', 'adhesive'],['Base', 'screed'],['Mortar', 'mineral'],['Mortar', 'cement'],['por'],['por']]
I am creating scatterplots and the function "hue" will have to refer to the second column. I hope I have made myself clear enough.