Passing a pandas dataframe column to an NLTK tokenizer

2024/9/23 3:23:39

I have a pandas dataframe raw_df with 2 columns, ID and sentences. I need to convert each sentence to a string. The code below produces no errors and says datatype of rule is "object."

raw_df['sentences'] = raw_df.sentences.astype(str)
raw.df.sentences.dtypes

Out: dtype('O')

Then, I try to tokenize sentences and get a TypeError that the method is expecting a string or bytes-like object. What am I doing wrong?

raw_sentences=tokenizer.tokenize(raw_df)

Same TypeError for

raw_sentences = nltk.word_tokenize(raw_df)
Answer

I'm assuming this is an NLTK tokenizer. I believe these work by taking sentences as input and returning tokenised words as output.

What you're passing is raw_df - a pd.DataFrame object, not a str. You cannot expect it to apply the function row-wise, without telling it to, yourself. There's a function called apply for that.

raw_df['tokenized_sentences'] = raw_df['sentences'].apply(tokenizer.tokenize)

Assuming this works without any hitches, tokenized_sentences will be a column of lists.

Since you're performing text processing on DataFrames, I'd recommend taking a look at another answer of mine here: Applying NLTK-based text pre-proccessing on a pandas dataframe

https://en.xdnf.cn/q/71873.html

Related Q&A

SWIG - Wrap C string array to python list

I was wondering what is the correct way to wrap an array of strings in C to a Python list using SWIG.The array is inside a struct :typedef struct {char** my_array;char* some_string; }Foo;SWIG automati…

How to show an Image with pillow and update it?

I want to show an image recreated from an img-vector, everything fine. now I edit the Vector and want to show the new image, and that multiple times per second. My actual code open tons of windows, wit…

How do I map Alt Gr key combinations in vim?

Suppose I wanted to map the command :!python % <ENTER> to pressing the keys Alt Gr and j together?

cannot import name get_user_model

I use django-registrations and while I add this code in my admin.pyfrom django.contrib import adminfrom customer.models import Customerfrom .models import UserProfilefrom django.contrib.auth.admin impo…

pytest: Best Way To Add Long Test Description in the Report

By default pytest use test function names or test files names in pytest reportsis there any Best way to add test description (Long test name) in the report with out renaming the files or functions usin…

Adding a calculated column to pandas dataframe

I am completely new to Python, pandas and programming in general, and I cannot figure out the following:I have accessed a database with the help of pandas and I have put the data from the query into a …

Scipy: Centroid of convex hull

how can I calculate the centroid of a convex hull using python and scipy? All I found are methods for computing Area and Volume.regards,frank.

Creating a montage of pictures in python

I have no experience with python, but the owner of this script is not responding.When I drag my photos over this script, to create a montage, it ends up cutting off half of the last photo on the right …

stop python program when ssh pipe is broken

Im writing a python script with an infinite while loop that I am running over ssh. I would like the script to terminate when someone kills ssh. For example:The script (script.py):while True:# do someth…

How do I export a TensorFlow model as a .tflite file?

Background information:I have written a TensorFlow model very similar to the premade iris classification model provided by TensorFlow. The differences are relatively minor: I am classifying football ex…