Question 1

I want to write a function that returns the frequency of each element in the n-gram of a given text. Help please. I did this code fo counting frequency of 2-gram

code:

 from nltk import FreqDistfrom nltk.util import ngrams    def compute_freq():textfile = "please write a function"bigramfdist = FreqDist()threeramfdist = FreqDist()for line in textfile:if len(line) > 1:tokens = line.strip().split(' ')bigrams = ngrams(tokens, 2)bigramfdist.update(bigrams)return bigramfdistbigramfdist = compute_freq()

Question 2

I don't see an expected output section, hence I assume this is what might need.

import nltkdef compute_freq(sentence, n_value=2):tokens = nltk.word_tokenize(sentence)ngrams = nltk.ngrams(tokens, n_value)ngram_fdist = nltk.FreqDist(ngrams)return ngram_fdist

By default this function returns frequency distribution of bigrams - for example,

text = "This is an example sentence."
freq_dist = compute_freq(text)

Now, freq_dist would look like -

FreqDist({('is', 'an'): 1, ('example', 'sentence'): 1, ('an', 'example'): 1, ('This', 
'is'): 1, ('sentence', '.'): 1})

From here you can print the keys and values like so

for k,v in freq_dist.items():print(k, v) ('is', 'an') 1
('example', 'sentence') 1
('an', 'example') 1
('This', 'is') 1
('sentence', '.') 1

For anything other that bigram, just change the 'n_value' argument when calling the function. For example,

freq_dist = compute_freq(text, n_value=3) #will give you trigram distribution('example', 'sentence', '.') 1
('an', 'example', 'sentence') 1
('This', 'is', 'an') 1
('is', 'an', 'example') 1

N_gram frequency python NTLK

Related Q&A

Is there a way to have a list of 4 billion numbers in Python?

ValueError: invalid literal for int() with base 10: when it worked before

How to fetch the current branch from Jenkins?

How to vertically stretch graphs with matplotlib subplot [duplicate]

Python Selenium Traceback (most recent call last):

getting error while installing opencv via pip

Check whether text contains x numbers in a row [closed]

How to add polling interval for a GET request in python

how to find the pathing flow and rank them using pig or hive?

Total beginner wrote a tic tac toe game in Python and would like some feedback