Count occurrences of a couple of specific words

2024/10/3 21:24:20

I have a list of words, lets say: ["foo", "bar", "baz"] and a large string in which these words may occur.

I now use for every word in the list the "string".count("word") method. This works OK, but seems rather inefficient. For every extra word added to the list the entire string must be iterated over an extra time.

Is their any better method to do this, or should I implement a custom method which iterates over the large string a single time, checking for each character if one of the words in the list has been reached?

To be clear:

  • I want the number of occurrences per word in the list.
  • The string to search in is different each time and consists of about 10000 chars
  • The list of words is constant
  • The words in the list of words can contain whitespace
Answer

Make a dict-typed frequency table for your words, then iterate over the words in your string.

vocab = ["foo", "bar", "baz"]
s = "foo bar baz bar quux foo bla bla"wordcount = dict((x,0) for x in vocab)
for w in re.findall(r"\w+", s):if w in wordcount:wordcount[w] += 1

Edit: if the "words" in your list contain whitespace, you can instead build an RE out of them:

from collections import Countervocab = ["foo bar", "baz"]
r = re.compile("|".join(r"\b%s\b" % w for w in vocab))
wordcount = Counter(re.findall(r, s))

Explanation: this builds the RE r'\bfoo bar\b|\bbaz\b' from the vocabulary. findall then finds the list ['baz', 'foo bar'] and the Counter (Python 2.7+) counts the occurrence of each distinct element in it. Watch out that your list of words should not contain characters that are special to REs, such as ()[]\.

https://en.xdnf.cn/q/70683.html

Related Q&A

numpy: how to fill multiple fields in a structured array at once

Very simple question: I have a structured array with multiple columns and Id like to fill only some of them (but more than one) with another preexisting array.This is what Im trying:strc = np.zeros(4, …

Combine date column and time column into datetime column

I have a Pandas dataframe like this; (obtained by parsing an excel file)| | COMPANY NAME | MEETING DATE | MEETING TIME| --------------------------------------------------------…

Matplotlib Plot Lines Above Each Bar

I would like to plot a horizontal line above each bar in this chart. The y-axis location of each bar depends on the variable target. I want to use axhline, if possible, or Line2D because I need to be …

Flask-SQLAlchemy Lower Case Index - skipping functional, not supported by SQLAlchemy reflection

First off. Apologies if this has been answered but I can not find the answer any where.I need to define a lowercase index on a Flask-SQLAlchemy object.The problem I have is I need a models username and…

pip listing global packages in active virtualenv

After upgrading pip from 1.4.x to 1.5 pip freeze outputs a list of my globally installed (system) packages instead of the ones installed inside of my virtualenv. Ive tried downgrading to 1.4 again but …

Scrapy Crawler in python cannot follow links?

I wrote a crawler in python using the scrapy tool of python. The following is the python code:from scrapy.contrib.spiders import CrawlSpider, Rule from scrapy.contrib.linkextractors.sgml import SgmlLin…

Remove commas in a string, surrounded by a comma and double quotes / Python

Ive found some similar themes on stackoverflow, but Im newbie to Python and Reg Exps.I have a string,"Completely renovated in 2009, the 2-star Superior Hotel Ibis BerlinMesse, with its 168 air-con…

I need help making a discord py temp mute command in discord py

I got my discord bot to have a mute command but you have to unmute the user yourself at a later time, I want to have another command called "tempmute" that mutes a member for a certain number…

How to clip polar plot in pylab/pyplot

I have a polar plot where theta varies from 0 to pi/2, so the whole plot lies in the first quater, like this:%pylab inline X=linspace(0,pi/2) polar(X,cos(6*X)**2)(source: schurov.com) Is it possible b…

Cython and c++ class constructors

Can someone suggest a way to manipulate c++ objects with Cython, when the c++ instance of one class is expected to feed the constructor of another wrapped class as described below? Please look at th…