Should I perform both lemmatization and stemming?

2024/10/11 22:25:19

I'm writing a text classification system in Python. This is what I'm doing to canonicalize each token:

lem, stem = WordNetLemmatizer(), PorterStemmer()
for doc in corpus:for word in doc:lemma = stem.stem(lem.lemmatize(word))

The reason I don't want to just lemmatize is because I noticed that WordNetLemmatizer wasn't handling some common inflections. In the case of adverbs, for example, lem.lemmatize('walking') returns walking.

Is it wise to perform both stemming and lemmatization? Or is it redundant? Do researchers typically do one or the other, and not both?

Answer

From my point of view, doing both stemming and lemmatization or only one will result in really SLIGHT differences, but I recommend for use just stemming because lemmatization sometimes need 'pos' to perform more presicsely.

For example, if you want to lemmatize "better", you should explicitly indicate pos: print(lemmatizer.lemmatize("better", pos="a"))

If not supplied, the default is "noun"

https://en.xdnf.cn/q/69724.html

Related Q&A

Python monkey patch private function

I have a module with a function (call it a()) that calls another function defined in the same module (call it __b()). __b() is a function which speaks to a website via urllib2 and gets some data back.…

How to interleave numpy.ndarrays?

I am currently looking for method in which i can interleave 2 numpy.ndarray. such that>>> a = np.random.rand(5,5) >>> print a [[ 0.83367208 0.29507876 0.41849799 0.58342521 0.818…

Object is not subscripable networkx

import itertools import copy import networkx as nx import pandas as pd import matplotlib.pyplot as plt #-- edgelist = pd.read_csv(https://gist.githubusercontent.com/brooksandrew /e570c38bcc72a8d1024…

WTForms : How to add autofocus attribute to a StringField

I am rather new to WTForms, Flask-WTF. I cant figure out how to simply add the HTML5 attribute "autofocus" to one of the form field, from the form definition. I would like to do that in the P…

Image rotation in Pillow

I have an image and I want to transpose it by 30 degrees. Is it possible to do by using something like the following?spinPicture003 = Picture003.transpose(Image.Rotate_30)

Python code to calculate angle between three points (lat long coordinates)

Can anybody suggest how to calculate angle between three points (lat long coordinates)A : (12.92473, 77.6183) B : (12.92512, 77.61923) C : (12.92541, 77.61985)

z3: solve the Eight Queens puzzle

Im using Z3 to solve the Eight Queens puzzle. I know that each queen can be represented by a single integer in this problem. But, when I represent a queen by two integers as following:from z3 import *X…

Image skewness kurtosis in python

Is there a python package that will provide me a way to clacluate Skewness and Kurtosis of an image?. Any example will be great.Thanks a lot.

Python: Getting all the items out of a `threading.local`

I have a threading.local object. When debugging, I want to get all the objects it contains for all threads, while I am only on one of those threads. How can I do that?

Why tuple is not mutable in Python? [duplicate]

This question already has answers here:Closed 11 years ago.Possible Duplicate:Why are python strings and tuples are made immutable? What lower-level design makes tuple not mutable in Python? Why th…