Question 1

I am trying to get a phrase count from a text file but so far I am only able to obtain a word count (see below). I need to extend this logic to count the number of times a two-word phrase appears in the text file.

Phrases can be defined/grouped by using logic from NLTK from my understanding. I believe the collections function is what I need to obtain the desired result, but I'm not sure how to go about implementing it from reading the NLTK documentation. Any tips/help would be greatly appreciated.

import re
import string
frequency = {}
document_text = open('Words.txt', 'r')
text_string = document_text.read().lower()
match_pattern = re.findall(r'\b[a-z]{3,15}\b', text_string)for word in match_pattern:count = frequency.get(word,0)frequency[word] = count + 1frequency_list = frequency.keys()for words in frequency_list:print (words, frequency[words])

Question 2

You can get all the two word phrases using the collocations module. This tool identifies words that often appear consecutively within corpora.

To find the two word phrases you need to first calculate the frequencies of words and their appearance in the context of other words. NLTK has a BigramCollocationFinder class that can do this. Here's how we can find the Bigram Collocations:

import re
import string
import nltk
from nltk.tokenize import word_tokenize, sent_tokenize
from nltk.collocations import BigramCollocationFinder, BigramAssocMeasuresfrequency = {}
document_text = open('Words.txt', 'r')
text_string = document_text.read().lower()
match_pattern = re.findall(r'\b[a-z]{3,15}\b', text_string)finder = BigramCollocationFinder.from_words(match_pattern)
bigram_measures = nltk.collocations.BigramAssocMeasures()
print(finder.nbest(bigram_measures.pmi, 2))

NLTK Collocations Docs: http://www.nltk.org/api/nltk.html?highlight=collocation#module-nltk.collocations

Counting phrases in Python using NLTK

Related Q&A

Break python list into multiple lists, shuffle each lists separately [duplicate]

AlterField on auto generated _ptr field in migration causes FieldError

How do I replace values in 2D numpy array using a dictionary of {value:(row#,column#)} pairs

Processing items with Scrapy pipeline

How to click a button to vote with python

Python 2.7 connection to Oracle: loosing (Polish) characters

getting friendlist from facebook graph-api

Sorting Angularjs ng-repeat by date

Html missing when using View page source

Move file to a folder or make a renamed copy if it exists in the destination folder