I have a huge set of data which has several columns and about 10k rows in more than 100 csv files, for now I am concerned about only one column with message format and from them I want to extract two parameters. I searched extensively around and I found two solutions that seem close but are not enough close to solve the question here. ONE & TWO
Input : Col name "Text"
and every message is a separate row in a csv.
"Let's Bounce!😉 #[message_1]Loving the energy & Microphonic Mayhem while…" #[message_2]RT @IVijayboi: #[message_3] @Bdutt@sardesairajdeep@rahulkanwal@abhisarsharma@ppbajpayi@Abpnewd@Ndtv@Aajtak#Jihadimedia@Ibn7 happy #PresstitutesDay"RT @RakeshKhatri23: MY LIFE #[message_4]WITHOUT YOU ISLIKE FLOWERS WITHOUT FRAGRANCE 💞💞~True Love~"Me & my baby ðŸ¶â¤ï¸ðŸ‘ @ Home Sweet Home #[message_5]
The input is a CSV file with several other columns in the data but I am interested only in this column. I want to separate the @name
and #keyword
from the input into a new column like:
expected output
text, mentions, keywords
[message], NAN, NAN
[message], NAN, NAN
[message], @IVijayboi, #Jihadimedia @Bdutt #PresstitutesDay@sardesairajdeep @rahulkanwal @abhisarsharma @ppbajpayi @Abpnewd @Ndtv @Aajtak @Ibn7
As we see in the input first and second message has no @
and #
so the column values NAN
but for the third message it has 10 @
and 2 #
keywords.
In simple words how do I separate the @ mentioned names and # keywords from the message to a separate column.