Calculating distance between word/document vectors from a nested dictionary

2024/11/25 5:05:41

I have a nested dictionary as such:

myDict = {'a': {1:2, 2:163, 3:12, 4:67, 5:84}, 'about': {1:27, 2:45, 3:21, 4:10, 5:15}, 'apple': {1:0, 2: 5, 3:0, 4:10, 5:0}, 'anticipate': {1:1, 2:5, 3:0, 4:8, 5:7}, 'an': {1:3, 2:15, 3:1, 4:312, 5:100}}
  • The outer key is a word,
  • the inner keys are file/document ids
  • the values are the number of times the word (outer key occurs)

How do I calculate the sum of the square values to the inner keys? For example for the inner key number 1, I should get:

2^2 + 27^2 + 0^2 + 1^2 + 3^2

because the inner key 1 appears 2 times in 'a', 27 times in 'about', 0 times in apple, 1 time in 'anticipate' and 3 times in 'an'


Given the nested dictionary object how do I find the distance between a pair of files/documents?

For example, the distance between the file/document id 1 and 2 would be calculate as such:

doc1 =  {'a':2, 'about':27, 'apple':0, 'anticipate':1, 'an':3} # (i.e. inner key `1`)
doc2 =  {'a':163, 'about':45, 'apple':5, 'anticipate':5, 'an':15} # (i.e. inner key `1`)

I want to know how different/similar the documents are, so how do I get a single floating number as a distance score for the two documents?

How do I calculate the dot product across these two documents?

I've tried calculating a single value for each document by considering:

((2*0) + (27*0) + (3*1) + (1*1) + (0*1)) / (magnitude of file vector * magnitude of search phrase vector)

Using my code as such:

vecDist = {}for word in search:for fileNum in myDict.iteritems():vecDist[fileNum] = "dotproduct" / magnitudeFileVec[fileNum] * magnitudeSearchVec
Answer

The first bit is easy enough. You want to build up a dictionary containing file numbers, and the sum of the squares of the values for each file number, something like this (untested) should do it:

fileVectors = {}for wordDict in myDict.itervalues():for fileNumber, wordCount in wordDict.iteritems():fileVectors[fileNumber] = fileVectors.get(fileNumber, 0) + (wordCount ** 2)
https://en.xdnf.cn/q/120700.html

Related Q&A

Is there anything wrong with the Python code itself? [closed]

Closed. This question needs debugging details. It is not currently accepting answers.Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to repro…

Python Iterate Over String

So Ive got a string e.g "AABBCCCASSDSFGDFGHDGHRTFBFIDHFDUFGHSIFUGEGFGNODN".I want to be able to loop over 16 characters starting and print it. Then move up 1 letter, loop over 16 characters a…

what does with open do in this situation [duplicate]

This question already has answers here:What is the Python "with" statement used for?(3 answers)Closed 7 years ago.sentence = "ASK NOT WHAT YOUR COUNTRY CAN DO FOR YOU ASK WHAT YOU CAN D…

How to perform HTTP GET operation in Python? [closed]

Its difficult to tell what is being asked here. This question is ambiguous, vague, incomplete, overly broad, or rhetorical and cannot be reasonably answered in its current form. For help clarifying thi…

Python Error TypeError: cannot concatenate str and float objects [duplicate]

This question already has answers here:Making a string out of a string and an integer in Python [duplicate](5 answers)Closed 7 years ago.I am new with Python programming. I keep getting the below error…

Countif function in python

enter image description here In excel file i can do countif funtion like attached picture but How can i do this countif function in Python Pandas,please help me by providing the code

How do i implement these algorithms below

Alogrithm 1:Get a list of numbers L1, L2, L3....LN as argumentAssume L1 is the largest, Largest = L1Take next number Li from the list and do the followingIf Largest is less than LiLargest = LiIf Li is …

How to run a shell script once a day?

I am trying to run this particular shell script only one time, daily. Heres my code for runLucene.py:#!/usr/bin/env pythonimport os from extras.download_datos_desambiguar import news_Lucenex=datetime.t…

How to fix Error: Please select a valid Python interpreter in Pycharm?

Error:Error: Please select a valid Python interpreterScreenshot:How to fix this?

Tuples conversion into JSON with python [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.Want to improve this question? Add details and clarify the problem by editing this post.Closed 8 years ago.Improve…