Regex stemmer code explanation

2024/11/15 23:19:47

Can someone please explain what does this code do?

def stemmer(word):[(stem,end)] = re.findall('^(.*ss|.*?)(s)?$',word)return stem
Answer

It splits a word into two parts: stem and end. There are three cases:

  1. The word ends with ss (or even more s): stem <- word and end <- ""
  2. The word ends with a single s: stem <- word without "s" and end <- "s"
  3. The word does not end with s: stem <- word and end <- ""

This is done by a regular expression which captures the full word (due to ^....$). The first part (i.e. stem) consists either of as much as possible ending in ss (.*ss) or if that is not possible of as less as possible (.*?). Then possibly an ending s is taken to be the end part.

Note that in the first case (as much as possible ending in ss) there can never be an additional s for the end part.

https://en.xdnf.cn/q/120396.html

Related Q&A

Scraping data from a dynamic web database with Python [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.Want to improve this question? Update the question so it focuses on one problem only by editing this post.Closed 9…

Python representation of floating point numbers [duplicate]

This question already has answers here:Floating Point Limitations [duplicate](3 answers)Closed 10 years ago.I spent an hour today trying to figure out whyreturn abs(val-desired) <= 0.1was occasional…

How to grep only duplicate key:value pair in python dictionary?

I have following python dictionary.a={name:test,age:26,place:world,name:test1}How to grep only duplicate key:value pair from the above?Output should be: "name: test and name:test1"

IndentationError - expected an indented block [duplicate]

This question already has answers here:Im getting an IndentationError (or a TabError). How do I fix it?(6 answers)Closed 7 months ago.I get the IndentationError: expected an indented block. I was tryi…

No axis named 1 for object type class pandas.core.frame.DataFrame

I created a DataFrame and I am trying to sort it based on the columns. I used the below code.frame.sort_index(axis=1)But this is causing the below errors------------------------------------------------…

str.replace with a variable

This is probably a simple fix, but having a little trouble getting my head around it; Im reading lines from a different script, and want to replace a line with a variable, however it replaces it with b…

How to generate DTD from XML?

Can a DTD be generated from an XML file using Python?

I have a very big list of dictionaries and I want to sum the insides

Something like{A: 3, 45, 34, 4, 2, 5, 94, 2139, 230345, 283047, 230847}, {B: 92374, 324, 345, 345, 45879, 34857987, 3457938457), {C: 23874923874987, 2347}How can I reduce that to {A: 2304923094820398},…

How to debug a TypeError no attribute __getitem__? [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.Want to improve this question? Add details and clarify the problem by editing this post.Closed 10 years ago.Improv…

Change character based off of its position? Python 2.7

I have a string on unknown length, contain the characters a-z A-Z 0-9. I need to change each character using their position from Left to Right using a dictionary.Example:string = "aaaaaaaa" d…