Detect abbreviations in the text in python

2024/10/1 15:16:38

I want to find abbreviations in the text and remove it. What I am currently doing is identifying consecutive capital letters and remove them.

But I see that it does not remove abbreviations such as MOOCs, M.O.O.C, M.O.O.Cs. Is there an easy way of doing this in python? Or are there any libraries that I can use instead?

Answer

The re regex library is probably the tool for the job.

In order to remove every string of consecutive uppercase letters, the following code can be used:

import re
mytext = "hello, look an ACRONYM"
mytext = re.sub(r"\b[A-Z]{2,}\b", "", mytext)

Here, the regex "\b[A-Z]{2,}\b" searches for multiple consecutive (indicated by [...]{2,}) capital letters (A-Z), forming a complete word (\b...\b). It then replaces them with the second string, "".

The convenient thing about regex is how easily it can be modified for more complex cases. For example:

mytext = re.sub(r"\b[A-Z\.]{2,}\b", "", mytext)

Will replace consecutive uppercase letters and full stops, removing acronyms like A.B.C.D. as well as ABCD. The \ before the . is necessary as . otherwise is used by regex as a kind of wildcard.

The ? specifier could also be used to remove acronyms that end in s, for example:

mytext = re.sub(r"\b[A-Z\.]{2,}s?\b", "", mytext)

This regex will remove acronyms like ABCD, A.B.C.D, and even A.B.C.Ds. If other forms of acronym need to be removed, the regex can easily be modified to accommodate them.

The re library also includes functions like findall, or the match function, which allow for programs to locate and process each acronym individually. This might come in handy if you want to, for example, look at a list of the acronyms being removed and check there are no legitimate words there.

https://en.xdnf.cn/q/70950.html

Related Q&A

filtering of tweets received from statuses/filter (streaming API)

I have N different keywords that i am tracking (for sake of simplicity, let N=3). So in GET statuses/filter, I will give 3 keywords in the "track" argument.Now the tweets that i will be recei…

UndefinedError: current_user is undefined

I have a app with flask which works before But Now I use Blueprint in it and try to run it but got the error so i wonder that is the problem Blueprint that g.user Not working? and how can I fix it Thn…

Scipy filter with multi-dimensional (or non-scalar) output

Is there a filter similar to ndimages generic_filter that supports vector output? I did not manage to make scipy.ndimage.filters.generic_filter return more than a scalar. Uncomment the line in the cod…

How do I stop execution inside exec command in Python 3?

I have a following code:code = """ print("foo")if True: returnprint("bar") """exec(code) print(This should still be executed)If I run it I get:Tracebac…

sqlalchemy concurrency update issue

I have a table, jobs, with fields id, rank, and datetime started in a MySQL InnoDB database. Each time a process gets a job, it "checks out" that job be marking it started, so that no other p…

Python matplotlib: Change axis labels/legend from bold to regular weight

Im trying to make some publication-quality plots, but I have encountered a small problem. It seems by default that matplotlib axis labels and legend entries are weighted heavier than the axis tick mark…

Negative extra_requires in Python setup.py

Id like to make a Python package that installs a dependency by default unless the user specially signals they do not want that.Example:pip install package[no-django]Does current pip and setup.py mechan…

Get Type in Robot Framework

Could you tell me about how to get the variable type in Robot Framework.${ABC} Set Variable Test ${XYZ} Set Variable 1233Remark: Get the variable Type such as string, intget ${ABC} type = strin…

Keras 2, TypeError: cant pickle _thread.lock objects

I am using Keras to create an ANN and do a grid search on the network. I have encountered the following error while running the code below:model = KerasClassifier(build_fn=create_model(input_dim), verb…

How to remove minimize/maximize buttons while preserving the icon?

Is it possible to display the icon for my toplevel and root window after removing the minimize and maximize buttons? I tried using -toolwindow but the icon cant be displayed afterwards. Is there anoth…