split list elements into sub-elements in pandas dataframe

2024/9/22 8:32:54

I have a dataframe as:-

Filtered_data['defence possessed russia china','factors driving china modernise']
['force bolster pentagon','strike capabilities pentagon congress detailing china']
[missiles warheads', 'deterrent face continued advances']
......
......

I just want to split each list elements into sub-elements(tokenized words).So, output Im looking for as:-

Filtered_data[defence, possessed,russia,factors,driving,china,modernise]
[force,bolster,strike,capabilities,pentagon,congress,detailing,china]
[missiles,warheads, deterrent,face,continued,advances]

here is my code what I have tried

for text in df['Filtered_data'].iteritems():
for i in text.split():print (i)
Answer

Use list comprehension with split and flatenning:

df['Filtered_data'] = df['Filtered_data'].apply(lambda x: [z for y in x for z in y.split()])
print (df)Filtered_data
0  [defence, possessed, russia, china, factors, d...
1  [force, bolster, pentagon, strike, capabilitie...
2  [missiles, warheads, deterrent, face, continue...

EDIT:

For unique values is standard way use sets:

df['Filtered_data'] = df['Filtered_data'].apply(lambda x: list(set([z for y in x for z in y.split()])))
print (df)Filtered_data
0  [russia, factors, defence, driving, china, mod...
1  [capabilities, detailing, china, force, pentag...
2  [deterrent, advances, face, warheads, missiles...

But if ordering of values is important use pandas.unique:

df['Filtered_data'] = df['Filtered_data'].apply(lambda x: pd.unique([z for y in x for z in y.split()]).tolist())
print (df)Filtered_data
0  [defence, possessed, russia, china, factors, d...
1  [force, bolster, pentagon, strike, capabilitie...
2  [missiles, warheads, deterrent, face, continue...
https://en.xdnf.cn/q/119154.html

Related Q&A

Image does not display on Pyqt [duplicate]

This question already has an answer here:Why Icon and images are not shown when I execute Python QT5 code?(1 answer)Closed 2 years ago.I am using Pyqt5, python3.9, and windows 11. I am trying to add a…

Expand the following dictionary into following list

how to generate the following list from the following dictionary d = {2: 4, 3: 1, 5: 3}f = [2**1,2**2, 2**3, 2**4, 3**1, 5**1, 5**2, 5**3, 2**1 * 3, 2**2 * 3, 2**3 * 3, 2**4 * 3, 5**1 * 3, 5**2 * 3, 5*…

how can I improve the accuracy rate of the below trained model using CNN

I have trained a model using python detect the colors of the gemstone and have built a CNN.Herewith Iam attaching the code of mine.(Referred https://www.kaggle.com) import os import matplotlib.pyplot a…

Classes and methods, with lists in Python

I have two classes, called "Pussa" and "Cat". The Pussa has an int atribute idPussa, and the Cat class has two atributes, a list of "Pussa" and an int catNum. Every class …

polars dataframe TypeError: must be real number, not str

so bascially i changed panda.frame to polars.frame for better speed in yolov5 but when i run the code, it works fine till some point (i dont exactly know when error occurs) and it gives me TypeError: m…

Get a string in Shell/Python using sys.argv

Im beginning with bash and Im executing a script :$ ./readtext.sh ./InputFiles/applications.txt Here is my readtext.sh code :#!/bin/bash filename="$1" counter=1 while IFS=: true; doline=read …

Need to combine two functions into one (Python)

Here is my code-def Max(lst):if len(lst) == 1:return lst[0]else:m = Max(lst[1:])if m > lst[0]: return melse:return lst[0] def Min(lst):if len(lst) == 1:return lst[0]else:m = Min(lst[1:])if m < ls…

Error: descriptor blit requires a pygame.Surface object but received a NoneType [duplicate]

This question already has answers here:How can I draw images and sprites in Pygame?(4 answers)Closed 2 years ago.I am creating a game. I wrote this code to create a sprite and its hitbox:hg = pygame.i…

How can I deal with overlapping rectangles? [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.Want to improve this question? Add details and clarify the problem by editing this post.Closed 5 years ago.Improve…

panda df not showing all rows after loading from MS SQL

Im using Pandas with latest sqlalchemy (1.4.36) to query a MS SQL DB, using the following Python 3.10.3 [Win] snippet: import pandas as pd # from sqlalchemy…