Filter items that only occurs once in a very large list

2024/9/20 18:35:18

I have a large list(over 1,000,000 items), which contains english words:

tokens = ["today", "good", "computer", "people", "good", ... ]

I'd like to get all the items that occurs only once in the list

now I'm using:

tokens_once = set(word for word in set(tokens) if tokens.count(word) == 1)

but it's really slow. how could I make this faster?

Answer

You iterate over a list and then for each element you do it again, which makes it O(N²). If you replace your count by a Counter, you iterate once over the list and then once again over the list of unique elements, which makes it, in the worst case, O(2N), i.e. O(N).

from collections import Countertokens = ["today", "good", "computer", "people", "good"]
single_tokens = [k for k, v in Counter(tokens).iteritems() if v == 1 ]
# single_tokens == ['today', 'computer', 'people']
https://en.xdnf.cn/q/72137.html

Related Q&A

Get Data JSON in Flask

Even following many example here & there, i cant get my API work in POST Method. Here the code about it :from flask import Flask, jsonify, request@app.route(/api/v1/lists, methods=[POST]) def add_e…

Commands working on windows command line but not in Git Bash terminal

I am trying to run certain commands in Git Bash but they continue to hang and not display anything. When I run them in the Windows command prompt they work.For example, in my windows command prompt the…

RBF interpolation: LinAlgError: singular matrix

The following call:rbf = Rbf(points[0], points[1], values,epsilon=2)results in an error:LinAlgError: singular matrixwith the following values:In [3]: points Out[3]: (array([71, 50, 48, 84, 71, 74, 89,…

What does `\x1b(B` do?

Im a Blessed user, and recently, when I tried to find out the contents of the term.bold() function, I got this output: \x1b[1m\x1b(B\x1b[mI understand what \x1b[1m and \x1b[m do, but what does \x1b(B d…

Not clicking all tabs and not looping once issues

I am trying to click the tabs on the webpage as seen below. Unfortunately, it only seems to click some of the tabs despite correct correct xpath in inspect Chrome. I can only assume it’s not clickin…

Opencv stream from a camera connected to a remote machine

I am developing a wx application in python for streaming and displaying video from two different webcams. This works fine, but now I need to do this in a different scenario in which the two cameras are…

Is there a callable equivalent to f-string syntax?

Everybody loves Python 3.6s new f-strings:In [33]: foo = {blah: bang}In [34]: bar = blahIn [35]: f{foo[bar]} Out[35]: bangHowever, while functionally very similar, they dont have the exact same semanti…

Python: list comprehension based on previous value? [duplicate]

This question already has answers here:Python list comprehension - access last created element(9 answers)Closed 10 months ago.Say I want to create a list using list comprehension like:l = [100., 50., 2…

How to run a coroutine inside a context?

In the Python docs about Context Vars a Context::run method is described to enable executing a callable inside a context so changes that the callable perform to the context are contained inside the cop…

Random Forest interpretation in scikit-learn

I am using scikit-learns Random Forest Regressor to fit a random forest regressor on a dataset. Is it possible to interpret the output in a format where I can then implement the model fit without using…