I have a nested dictionary as such:
myDict = {'a': {1:2, 2:163, 3:12, 4:67, 5:84}, 'about': {1:27, 2:45, 3:21, 4:10, 5:15}, 'apple': {1:0, 2: 5, 3:0, 4:10, 5:0}, 'anticipate': {1:1, 2:5, 3:0, 4:8, 5:7}, 'an': {1:3, 2:15, 3:1, 4:312, 5:100}}
- The outer key is a word,
- the inner keys are file/document ids
- the values are the number of times the word (outer key occurs)
How do I calculate the sum of the square values to the inner keys? For example for the inner key number 1
, I should get:
2^2 + 27^2 + 0^2 + 1^2 + 3^2
because the inner key 1
appears 2 times in 'a', 27 times in 'about', 0 times in apple, 1 time in 'anticipate' and 3 times in 'an'
Given the nested dictionary object how do I find the distance between a pair of files/documents?
For example, the distance between the file/document id 1
and 2
would be calculate as such:
doc1 = {'a':2, 'about':27, 'apple':0, 'anticipate':1, 'an':3} # (i.e. inner key `1`)
doc2 = {'a':163, 'about':45, 'apple':5, 'anticipate':5, 'an':15} # (i.e. inner key `1`)
I want to know how different/similar the documents are, so how do I get a single floating number as a distance score for the two documents?
How do I calculate the dot product across these two documents?
I've tried calculating a single value for each document by considering:
((2*0) + (27*0) + (3*1) + (1*1) + (0*1)) / (magnitude of file vector * magnitude of search phrase vector)
Using my code as such:
vecDist = {}for word in search:for fileNum in myDict.iteritems():vecDist[fileNum] = "dotproduct" / magnitudeFileVec[fileNum] * magnitudeSearchVec