How to find collocations in text, python

2024/9/25 21:25:37

How do you find collocations in text? A collocation is a sequence of words that occurs together unusually often. python has built-in func bigrams that returns word pairs.

>>> bigrams(['more', 'is', 'said', 'than', 'done'])
[('more', 'is'), ('is', 'said'), ('said', 'than'), ('than', 'done')]
>>>

What's left is to find bigrams that occur more often based on the frequency of individual words. Any ideas how to put it in the code?

Answer

Try NLTK. You will mostly be interested in nltk.collocations.BigramCollocationFinder, but here is a quick demonstration to show you how to get started:

>>> import nltk
>>> def tokenize(sentences):
...     for sent in nltk.sent_tokenize(sentences.lower()):
...         for word in nltk.word_tokenize(sent):
...             yield word
... >>> nltk.Text(tkn for tkn in tokenize('mary had a little lamb.'))
<Text: mary had a little lamb ....>
>>> text = nltk.Text(tkn for tkn in tokenize('mary had a little lamb.'))

There are none in this small segment, but here goes:

>>> text.collocations(num=20)
Building collocations list
https://en.xdnf.cn/q/71533.html

Related Q&A

How to set size of a Gtk Image in Python

How can I set the width and height of a GTK Image in Python 3.

Numpy Vectorized Function Over Successive 2d Slices

I have a 3D numpy array. I would like to form a new 3d array by executing a function on successive 2d slices along an axis, and stacking the resulting slices together. Clearly there are many ways to do…

MySQL and Python Select Statement Issues

Thanks for taking the time to read this. Its going to be a long post to explain the problem. I havent been able to find an answer in all the usual sources.Problem: I am having an issue with using the …

How to pass variable in url to Django List View

I have a Django generic List View that I want to filter based on the value entered into the URL. For example, when someone enters mysite.com/defaults/41 I want the view to filter all of the values mat…

Django Select Option selected issue

I tried to follow some examples on stackoverflow for option selected in select list but still, I could not get it work.This is my code snippet<select name="topic_id" style="width:90%&…

reading tab-delimited data without header in pandas

Im having trouble using pandas to open tab-delimited data without headers.My test data (actually contains 200 lines, of which I am showing the first 10):Tag19184 CTAAC hffef 1 a 36 - chr1…

Python Try/Except with multiple except blocks

try:raise KeyError() except KeyError:print "Caught KeyError"raise Exception() except Exception:print "Caught Exception"As expected, raising Exception() on the 5th line isnt caught i…

How to install trax, jax, jaxlib on M1 Mac on macOS 12?

trax New to trax, Im trying to run it locally (macOS 12.1, Apple Silicon ARM M1 processor, 8GB RAM, Anaconda), but Im running into some issues. In an environment with python 3.8.5, I installed trax run…

How do I match a word in a text file using python?

I want to search and match a particular word in a text file.with open(wordlist.txt, r) as searchfile:for line in searchfile:if word in line:print lineThis code returns even the words that contain subst…

Unable to Delete Videos with the Youtube Data API

Cant get deleting videos to work using the Youtube Data API. Im using the Python Client Library.All of this seems straight from the docs, so Im really confused as to why its not working. Heres my fun…