Match values of different dataframes

2024/10/5 15:43:39

This dataframe is the principal with the original tweets. "original_ds_.csv"

id              tweet              
---------------------------------------------
78           "onetoone"              
86           "maybe tomorrow"        
72           "thnk you"                

Then, I extracted the tweet conversation for each tweet. As a result, I have a dataframe called "threads.csv"

This dataset represents the conversation tweets extracted from an original tweet

id              tweet              conver_id
---------------------------------------------
34           "hello world"            78
36           "nice to have"           78
56           "just an exam"           72 

-The conver_id is the column which represents the id who started the conversation (extracted from the dataset original_ds_.csv)

-The conver_id is the "id" of the original_ds.csv

-It is possible have one or more tweets associated to one original tweet in the dataset as the first.

Now my question is, how can I do this:

If the id on original_ds_.csv is the same in the column conv_id of the threads.csv add a new column in the threads.csv called File_Name with the value SPANISH

Answer

The logic is a by strange to me, but if a understand correctly, starting from these dataframes:

df1 = pd.DataFrame({'tweet': list('ABC')}, index=[78,86,72])
df2 = pd.DataFrame({'tweet': list('DEF'), 'conver_id': (78,78,12)}, index=(34,36,56))
>>> df1tweet
78     A
86     B
72     C>>> df2tweet  conver_id
34     D         78
36     E         78
56     F         12

you can check for each element of df2['conver_id'] if it is in df1.index and map to SPANISH is True:

df2['File_Name'] = (np.vectorize({True: 'SPANISH',False: ''}.get)(df2['conver_id'].isin(df1.index)))

output:

   tweet  conver_id File_Name
34     D         78   SPANISH
36     E         78   SPANISH
56     F         12                  

If this is not what you want, please update your question with the expected output

https://en.xdnf.cn/q/119590.html

Related Q&A

EOF while parsing

def main():NUMBER_OF_DAYS = 10NUMBER_OF_HOURS = 24data = []for i in range(NUMBER_OF_DAYS):data.append([])for j in range(NUMBER_OF_HOURS):data[i].append([])data[i][j].append(0)data[i][j].append(0)for k …

Why is bool(x) where x is any integer equal to True

I expected bool(1) to equate to True using Python - it does - then I expected other integers to error when converted to bool but that doesnt seem to be the case:>>> x=23 #<-- replace with a…

Getting TypeError while fetching value from table using Python and Django

I am getting error while fetching value from table using Python and Django. The error is below:Exception Type: TypeError Exception Value: not all arguments converted during string formattingMy code…

ValueError: The view **** didnt return an HttpResponse object. It returned None instead

Im using Django forms to handle user input for some point on my Django app. but it keeps showing this error whenever the user tries to submit the form. ValueError: The view *my view name goes here* di…

Game Development in Python, ruby or LUA? [closed]

As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, argum…

Problem with this error: (-215:Assertion failed) !ssize.empty() in function cv::resize OpenCV

I got stuck with this error after running resize function line: import cv2 import numpy as np import matplotlib.pyplot as pltnet = cv2.dnn.readNetFromDarknet(yolov3_custom.cfg, yolov3_custom_last.weigh…

When I run it tells me this : NameError: name lock is not defined?

• Assume that you have an array (data=[]) containing 500,000 elements and that each element has been assigned a random value between 1 and 10 (random.randint(1,10)) .for i in range (500000):data[i]…

Unable to find null bytes in Python code in Pycharm?

During copy/pasting code I often get null bytes in Python code. Python itself reports general error against module and doesnt specify location of null byte. IDE of my choice like PyCharm, doesnt have c…

remove single quotes in list, split string avoiding the quotes

Is it possible to split a string and to avoid the quotes(single)? I would like to remove the single quotes from a list(keep the list, strings and floats inside:l=[1,2,3,4.5]desired output:l=[1, 2, 3, …

Image Segmentation to Map Cracks

I have this problem that I have been working on recently and I have kind of reached a dead end as I am not sure why the image saved when re opened it is loaded as black image. I have (original) as my b…