Apply a function to each element of a pandas series

2024/11/17 22:08:41

I am trying to tokenize each sentence of my pandas series. I try to do as I see in the documentation, using apply, but didn't work:

x.apply(nltk.word_tokenize)

If I just use nltk.word_tokenize(x) didn't work too, because x is not a string. Does someone have any idea?

Edited: x is a pandas series with sentences:

0       A very, very, very slow-moving, aimless movie ...
1       Not sure who was more lost - the flat characte...
2       Attempting artiness with black & white and cle...

With x.apply(nltk.word_tokenize) it returns exactly the same:

0       A very, very, very slow-moving, aimless movie ...
1       Not sure who was more lost - the flat characte...
2       Attempting artiness with black & white and cle...

With nltk.word_tokenize(x) the error is:

TypeError: expected string or bytes-like object
Answer

Question: are you saving your intermediate results? x.apply() creates a copy of your original Series with the appropriate transformations applied to each element of the Series. See below for an example of how this might be affecting your code...

We'll start by confirming that word_tokenize() works on a sample snippet of text.

>>> import pandas as pd
>>> from nltk import word_tokenize
>>> word_tokenize('hello how are you')   # confirming that word_tokenize works.
['hello', 'how', 'are', 'you']            

Then let's create a Series to play with.

>>> s = pd.Series(['hello how are you','lorem ipsum isumming lorems','more stuff in a line'])>>> print(s)
0              hello how are you
1    lorem ipsum isumming lorems
2           more stuff in a line
dtype: object

Executing word_tokenize using the apply() function on an interactive Python prompt shows that it tokenizes...

But doesn't indicate that this is a copy... not a permanent change to s

>>> s.apply(word_tokenize)
0              [hello, how, are, you]
1    [lorem, ipsum, isumming, lorems]
2          [more, stuff, in, a, line]
dtype: object

In fact, we can print s to show that it is unchanged...

>>> print(s)
0              hello how are you
1    lorem ipsum isumming lorems
2           more stuff in a line
dtype: object

If, instead, we supply a label, in this case wt to the results of the apply() function call it allows us to save the results permanently. Which we can see by printing wt.

>>> wt = s.apply(word_tokenize)
>>> print(wt)
0              [hello, how, are, you]
1    [lorem, ipsum, isumming, lorems]
2          [more, stuff, in, a, line]
dtype: object

Doing this on an interactive prompt allows us to more easily detect such a condition, but running it in a script sometimes means that the fact that a copy was produced will pass silently and without indication.

https://en.xdnf.cn/q/120079.html

Related Q&A

ValueError: could not convert string to float: in Python 3.10

When someone writes a string or a letter, I want the code make them go back, and the code to print "must be a number and bigger than 0 and less than 100", but what actually happens is the cod…

How do I access Class fields in Python Graph-Tool property maps?

Im trying to draw a graph with a class as a vertex property. How do I draw the graph with the vertex_text set to the name field of the classes they contain?from graph_tool.all import *class Node(objec…

How to iterate through each line of a text file and get the sentiment of those lines using python?

Currently, Im working on Sentiment Analysis part. For this I have preferred to use Standford Core NLP library using python. Im able to get the sentiment for each sentence using the following code : fro…

RECURSIVE function that will sum digits of input

Trying to write a piece of code that will sum the digits of a number. Also I should add that I want the program to keep summing the digits until the sum is only 1 digit. For example, if you start with …

Make sure matrix row took from text file are same length(python3) [duplicate]

This question already has answers here:Making sure length of matrix row is all the same (python3)(3 answers)Closed 10 years ago.so I have this code to input a matrix from a text file:import ospath = in…

how to randomize order of questions in a quiz in python? [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.Want to improve this question? Add details and clarify the problem by editing this post.Closed 9 years ago.Improve…

How transform days to hours, minutes and seconds in Python

I have value 1 day, 14:44:00 which I would like transform into this: 38:44:00. Ive tried the following code: myTime = ((myTime.days*24+myTime.hours), myTime.minutes, myTime.seconds) But it doesnt work.…

Brute Force in python [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.Want to improve this question? Add details and clarify the problem by editing this post.Closed 10 years ago.Improv…

Choosing only non-zeros from a long list of numbers in text file

I have a text file with a long list of numbers. I would like to choose only the non-zeros and make another text file. This is a portion of the input file:0.00000E+00 0.00000E+00 0.00000E+00 0.00000…

Why can I not plot using Python on repl.it

For practical reasons, I want to test a small piece of Pyton code on repl.it (webbased, so I do not need to install Python).The codeimport numpy as np import matplotlib.pyplot as plttime = np.array([0,…