So I am taking a natural language processing class and I need to create a trigram language model to generate random text that looks "realistic" to a certain degree based off of some sample data.
Essencially need to create a "trigram" to hold the various 3 letter grammar word combinations. My professor hints that this can be done by having a dictionary of dictionaries of dictionaries which I attempted to create using:
trigram = defaultdict( defaultdict(defaultdict(int)))
However I get an error that says:
trigram = defaultdict( dict(dict(int)))
TypeError: 'type' object is not iterable
How would I do about created a 3 layer nested dictionary or a dictionary of dictionaries of dictionaries of int
values?
I guess people vote down a question on stack overflow if they don't know how to answer it. I'll add some background to better explain the question for those willing to help.
This trigram is used to keep track of triple word patterns. The are used in text language processing software and almost everywhere throughout natural language processing "think siri or google now".
If we designate the 3 levels of dictionaries as dict1 dict2 and dict3 then parsing a text file and reading a statement "The boy runs" would have the following:
A dict1 which has a key of "the". Accessing that key would return dict2 which contains the key "boy". Accessing that key would return the final dict3 which would contain the key "runs" now accessing that key would return the value 1.
This symbolizes that in this text "the boy runs" has appeared 1 time. If we encounter it again then we would follow the same process and increment 1 to two. If we encounter "the girl walks" then dict2 the "the" keys dictionary will now contain another key for "girl" which would have a dict3 that has a key of "walks" and a value of 1 and so forth. Eventually after parsing a ton of text (and keeping track of the word count" you will have a trigram which can determine the likeliness of a certain starting word leading to a 3 word combination based off the frequency of times they appeared in the previously parsed text.
This can help you create grammar rules to identify languages or in my case created randomly generated text that looks very much like grammatical english. I need a three layer dictionary because at any position of a 3 word combination there can be another word that can create a whole different set of combinations. I TRIED my best to explain trigrams and the purpose behind them to the best of my ability... granted I just stated the class a couple weeks ago.
Now... with ALL of that being said. How would I go about creating a dictionary of dictionaries of dictionaries whose base dictionary holds values of type int in python?
trigram = defaultdict( defaultdict(defaultdict(int)))
throws an error for me