I am trying to calculate semantic similarity between two words. I am using Wordnet-based similarity measures i.e Resnik measure(RES), Lin measure(LIN), Jiang and Conrath measure(JNC) and Banerjee and Pederson measure(BNP).

To do that, I am using nltk and Wordnet 3.0. Next, I want to combine the similarity values obtained from different measure. To do that i need to normalize the similarity values as some measure give values between 0 and 1, while others give values greater than 1.

So, my question is how do I normalize the similarity values obtained from different measures.

**Extra detail** on what I am actually trying to do: I have a set of words. I calculate pairwise similarity between the words. and remove the words that are not strongly correlated with other words in the set.

## How to normalize a single measure

Let's consider a single arbitrary similarity measure `M`

and take an arbitrary word `w`

.

Define `m = M(w,w)`

. Then m takes maximum possible value of `M`

.

Let's define `MN`

as a normalized measure `M`

.

For any two words `w, u`

you can compute `MN(w, u) = M(w, u) / m`

.

It's easy to see that if `M`

takes non-negative values, then `MN`

takes values in `[0, 1]`

.

## How to normalize a measure combined from many measures

In order to compute your own defined measure `F`

combined of k different measures `m_1, m_2, ..., m_k`

first normalize independently each `m_i`

using above method and then define:

```
alpha_1, alpha_2, ..., alpha_k
```

such that `alpha_i`

denotes the weight of i-th measure.

All alphas must sum up to 1, i.e:

```
alpha_1 + alpha_2 + ... + alpha_k = 1
```

Then to compute your own measure for `w, u`

you do:

```
F(w, u) = alpha_1 * m_1(w, u) + alpha_2 * m_2(w, u) + ... + alpha_k * m_k(w, u)
```

It's clear that `F`

takes values in [0,1]