How to Normalize similarity measures from Wordnet

2024/4/15 2:06:19

I am trying to calculate semantic similarity between two words. I am using Wordnet-based similarity measures i.e Resnik measure(RES), Lin measure(LIN), Jiang and Conrath measure(JNC) and Banerjee and Pederson measure(BNP).

To do that, I am using nltk and Wordnet 3.0. Next, I want to combine the similarity values obtained from different measure. To do that i need to normalize the similarity values as some measure give values between 0 and 1, while others give values greater than 1.

So, my question is how do I normalize the similarity values obtained from different measures.

Extra detail on what I am actually trying to do: I have a set of words. I calculate pairwise similarity between the words. and remove the words that are not strongly correlated with other words in the set.


How to normalize a single measure

Let's consider a single arbitrary similarity measure M and take an arbitrary word w.

Define m = M(w,w). Then m takes maximum possible value of M.

Let's define MN as a normalized measure M.

For any two words w, u you can compute MN(w, u) = M(w, u) / m.

It's easy to see that if M takes non-negative values, then MN takes values in [0, 1].

How to normalize a measure combined from many measures

In order to compute your own defined measure F combined of k different measures m_1, m_2, ..., m_k first normalize independently each m_i using above method and then define:

alpha_1, alpha_2, ..., alpha_k

such that alpha_i denotes the weight of i-th measure.

All alphas must sum up to 1, i.e:

alpha_1 + alpha_2 + ... + alpha_k = 1

Then to compute your own measure for w, u you do:

F(w, u) = alpha_1 * m_1(w, u) + alpha_2 * m_2(w, u) + ... + alpha_k * m_k(w, u)

It's clear that F takes values in [0,1]

Related Q&A

How to open chrome developer console using Selenium in Python?

I am trying to open developer console in chrome using selenium webdriver. I am doingfrom selenium import webdriverfrom selenium.webdriver.common import action_chains, keys...browser = webdriver.Chrome(…

How to enable an allow-insecure-localhost flag in Chrome from selenium?

I want to enable "allow-insecure-localhost" flag from selenium. How I can do it?selenium: 3.12.0, Python:3.6.5Chrome driver creation code:def create_driver():options = Options()if sys.plat…

Getting pandas dataframe from list of nested dictionaries

I am new to Python so this may be pretty straightforward, but I have not been able to find a good answer for my problem after looking for a while. I am trying to create a Pandas dataframe from a list o…

Seaborn catplot combined with PairGrid

I am playing with the Titanic dataset, and trying to produce a pair plot of numeric variables against categorical variables. I can use Seaborns catplot to graph a plot of one numeric variable against o…

Control individual linewidths in seaborn heatmap

Is it possible to widen the linewidth for sepcific columns and rows in a seaborn heatmap?For example, can this heatmapimport numpy as np; np.random.seed(0) import seaborn as sns; sns.set() uniform_dat…

openerp context in act_window

In OpenERP 6.1 this act_window:<act_windowdomain="[(id, =, student)]"id="act_schedule_student"name="Student"res_model="school.student"src_model="school.s…

Djangos redirects app doesnt work with URL parameters

I recently installed Djangos default redirects app on my site using the exact instructions specified:Ensured django.contrib.sites framework is installed. Added django.contrib.redirects to INSTALLED_APP…

get fully qualified method name from inspect stack

I have trouble completing the following function:def fullyQualifiedMethodNameInStack(depth=1):"""The function should return <file>_<class>_<method> for the method in th…

Project Euler #18 - how to brute force all possible paths in tree-like structure using Python?

Am trying to learn Python the Atlantic way and am stuck on Project Euler #18.All of the stuff I can find on the web (and theres a LOT more googling that happened beyond that) is some variation on well …

Is it possible to sniff the Character encoding?

I have a webpage that accepts CSV files. These files may be created in a variety of places. (I think) there is no way to specify the encoding in a CSV file - so I can not reliably treat all of them as …