Question 1

I am trying to tokenize each sentence of my pandas series. I try to do as I see in the documentation, using apply, but didn't work:

x.apply(nltk.word_tokenize)

If I just use nltk.word_tokenize(x) didn't work too, because x is not a string. Does someone have any idea?

Edited: x is a pandas series with sentences:

0       A very, very, very slow-moving, aimless movie ...
1       Not sure who was more lost - the flat characte...
2       Attempting artiness with black & white and cle...

With x.apply(nltk.word_tokenize) it returns exactly the same:

0       A very, very, very slow-moving, aimless movie ...
1       Not sure who was more lost - the flat characte...
2       Attempting artiness with black & white and cle...

With nltk.word_tokenize(x) the error is:

TypeError: expected string or bytes-like object

Question 2

Question: are you saving your intermediate results? x.apply() creates a copy of your original Series with the appropriate transformations applied to each element of the Series. See below for an example of how this might be affecting your code...

We'll start by confirming that word_tokenize() works on a sample snippet of text.

>>> import pandas as pd
>>> from nltk import word_tokenize
>>> word_tokenize('hello how are you')   # confirming that word_tokenize works.
['hello', 'how', 'are', 'you']

Then let's create a Series to play with.

>>> s = pd.Series(['hello how are you','lorem ipsum isumming lorems','more stuff in a line'])>>> print(s)
0              hello how are you
1    lorem ipsum isumming lorems
2           more stuff in a line
dtype: object

Executing word_tokenize using the apply() function on an interactive Python prompt shows that it tokenizes...

But doesn't indicate that this is a copy... not a permanent change to s

>>> s.apply(word_tokenize)
0              [hello, how, are, you]
1    [lorem, ipsum, isumming, lorems]
2          [more, stuff, in, a, line]
dtype: object

In fact, we can print s to show that it is unchanged...

>>> print(s)
0              hello how are you
1    lorem ipsum isumming lorems
2           more stuff in a line
dtype: object

If, instead, we supply a label, in this case wt to the results of the apply() function call it allows us to save the results permanently. Which we can see by printing wt.

>>> wt = s.apply(word_tokenize)
>>> print(wt)
0              [hello, how, are, you]
1    [lorem, ipsum, isumming, lorems]
2          [more, stuff, in, a, line]
dtype: object

Doing this on an interactive prompt allows us to more easily detect such a condition, but running it in a script sometimes means that the fact that a copy was produced will pass silently and without indication.

Apply a function to each element of a pandas series

Related Q&A

ValueError: could not convert string to float: in Python 3.10

How do I access Class fields in Python Graph-Tool property maps?

How to iterate through each line of a text file and get the sentiment of those lines using python?

RECURSIVE function that will sum digits of input

Make sure matrix row took from text file are same length(python3) [duplicate]

how to randomize order of questions in a quiz in python? [closed]

How transform days to hours, minutes and seconds in Python

Brute Force in python [closed]

Choosing only non-zeros from a long list of numbers in text file

Why can I not plot using Python on repl.it