I have a dataframe as:-
Filtered_data['defence possessed russia china','factors driving china modernise']
['force bolster pentagon','strike capabilities pentagon congress detailing china']
[missiles warheads', 'deterrent face continued advances']
......
......
I just want to split each list elements into sub-elements(tokenized words).So, output Im looking for as:-
Filtered_data[defence, possessed,russia,factors,driving,china,modernise]
[force,bolster,strike,capabilities,pentagon,congress,detailing,china]
[missiles,warheads, deterrent,face,continued,advances]
here is my code what I have tried
for text in df['Filtered_data'].iteritems():
for i in text.split():print (i)
Use list comprehension with split
and flatenning:
df['Filtered_data'] = df['Filtered_data'].apply(lambda x: [z for y in x for z in y.split()])
print (df)Filtered_data
0 [defence, possessed, russia, china, factors, d...
1 [force, bolster, pentagon, strike, capabilitie...
2 [missiles, warheads, deterrent, face, continue...
EDIT:
For unique values is standard way use set
s:
df['Filtered_data'] = df['Filtered_data'].apply(lambda x: list(set([z for y in x for z in y.split()])))
print (df)Filtered_data
0 [russia, factors, defence, driving, china, mod...
1 [capabilities, detailing, china, force, pentag...
2 [deterrent, advances, face, warheads, missiles...
But if ordering of values is important use pandas.unique
:
df['Filtered_data'] = df['Filtered_data'].apply(lambda x: pd.unique([z for y in x for z in y.split()]).tolist())
print (df)Filtered_data
0 [defence, possessed, russia, china, factors, d...
1 [force, bolster, pentagon, strike, capabilitie...
2 [missiles, warheads, deterrent, face, continue...