Columns and rows concatenation with a commun value in another column

2024/7/7 5:25:08

In the below mentioned table, I want to concatenate the columns Tri_gram_sents and Value together and then all rows which has the same number in column sentence.

   Tri_gram_sents                   Value          sentence(('<s>', '<s>'), 'ABC')          0.161681         1(('<s>', 'ABC'), 'ABC')          0.472973         1(('ABC', 'ABC'), 'ABC')          0.305732         1(('ABC', 'ABC'), 'ABC')          0.005655         1(('ABC', 'ABC'), '</s>')         0.434783         1(('ABC', '</s>'), '</s>')        0.008547         1(('<s>', '<s>'), 'DEF')          0.111111         2(('<s>', 'DEF'), 'DEF')          0.039474         2(('DEF', 'DEF'), 'DEF')          0.207317         2(('DEF', 'DEF'), 'DEF')          0.074803         2(('DEF', 'DEF'), '</s>')         0.037940         2(('DEF', '</s>'), '</s>')        0.033163         2(('<s>', '<s>'), 'GHI')          0.250000         3(('<s>', 'GHI'), 'GHI')          0.103316         3(('GHI', 'GHI'), 'GHI')          0.024155         3(('GHI', 'GHI'), '</s>')         0.028302         3(('GHI', '</s>'), '</s>')        0.117647         3    `

For above set of rows, I will get a total of 3 rows in another table and my expected output looks:

(('<s>', '<s>'), 'ABC') 0.161681 (('<s>', 'ABC'), 'ABC') 0.472973 (('ABC', 'ABC'), 'ABC') 0.305732 (('ABC', 'ABC'), 'ABC') 0.005655 (('ABC', 'ABC'), '</s>') 0.434783 (('ABC', '</s>'), '</s>') 0.008547
(('<s>', '<s>'), 'DEF') 0.111111 (('<s>', 'DEF'), 'DEF') 0.039474 (('DEF', 'DEF'), 'DEF') 0.207317 (('DEF', 'DEF'), 'DEF') 0.074803 (('DEF', 'DEF'), '</s>') 0.037940 (('DEF', '</s>'), '</s>') 0.033163
(('<s>', '<s>'), 'GHI') 0.250000 (('<s>', 'GHI'), 'GHI') 0.103316 (('GHI', 'GHI'), 'GHI') 0.024155 (('GHI', 'GHI'), '</s>') 0.028302 (('GHI', '</s>'), '</s>') 0.117647
Answer

You can use groupby and join to create the expected output. One way is to create a column to_join from the columns Tri_gram_sents and Value, and then agg this column:

df['to_join'] = df['Tri_gram_sents'] + ' ' + df['Value'].astype(str)
ser_output = df.groupby('sentence')['to_join'].agg(' '.join)

Or you can do everything in one line without create the column with apply:

ser_output = (df.groupby('sentence').apply(lambda df_g: ' '.join(df_g['Tri_gram_sents']+' '+df_g['Value'].astype(str))))

and you get ser_output:

sentence
1    (('<s>', '<s>'), 'ABC') 0.161681 (('<s>', 'ABC...
2    (('<s>', '<s>'), 'DEF') 0.111111 (('<s>', 'DEF...
...

where the first element looks as expected:

"(('<s>', '<s>'), 'ABC') 0.161681 (('<s>', 'ABC'), 'ABC') 0.472973 (('ABC', 'ABC'), 'ABC') 0.305732 (('ABC', 'ABC'), 'ABC') 0.005655 (('ABC', 'ABC'), '</s>') 0.434783 (('ABC', '</s>'), '</s>') 0.008547"
https://en.xdnf.cn/q/120351.html

Related Q&A

Python Indentation Error [closed]

Closed. This question is not reproducible or was caused by typos. It is not currently accepting answers.This question was caused by a typo or a problem that can no longer be reproduced. While similar q…

Allow form submission only once a day django

I want to allow users to submit a django form once, and only once everyday. After submitting the form, the form wouldnt even show (server-side checkings, I dont want to use JS or client side thing; eas…

Counting percentage of element occurence from an attribute in a class. Python

I have a class called transaction that have these attributes Transaction([time_stamp, time_of_day, day_of_month ,week_day, duration, amount, trans_type, location])an example of the data set is as sucht…

AWS | Syntax error in module: invalid syntax

I have created python script which is uploaded as a zip file in AWS Lambda function with stompy libraries bundled in them.Logs for python 2.7:-Response: nullRequest ID: "c334839f-ee46-11e8-8970-61…

Algorithm for finding if an array is balanced [closed]

This question is unlikely to help any future visitors; it is only relevant to a small geographic area, a specific moment in time, or an extraordinarily narrow situation that is not generally applicable…

Merging two dataframes in python pandas [duplicate]

This question already has answers here:Pandas Merging 101(8 answers)Closed 5 years ago.I have a dataframe A:a 1 a 2 b 1 b 2Another dataframe B:a 3 a 4 b 3I want my result dataframe to be like a 1 3 a …

Searching for the best fit price for multiple customers [duplicate]

This question already has an answer here:Comparing multiple price options for many customers algorithmically(1 answer)Closed 10 years ago.A restatement of Comparing multiple price options for many cust…

Can we chain the ternary operator in Python? [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.Want to improve this question? Add details and clarify the problem by editing this post.Closed 4 years ago.The com…

evaluate a python string expression using dictionary values

I am parsing a text file which contain python "string" inside it. For e.g.:my_home1 in houses.split(,) and 2018 in iphone.split(,) and 14 < maskfor the example above, I wrote a possible di…

How to simply get the master volume of Windows in Python? [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.Want to improve this question? Add details and clarify the problem by editing this post.Closed 1 year ago.Improve …