python - How to Normalize similarity measures from Wordnet -
i trying calculate semantic similarity between 2 words. using wordnet-based similarity measures i.e resnik measure(res), lin measure(lin), jiang , conrath measure(jnc) , banerjee , pederson measure(bnp).
to that, using nltk , wordnet 3.0. next, want combine similarity values obtained different measure. need normalize similarity values measure give values between 0 , 1, while others give values greater 1.
so, question how normalize similarity values obtained different measures.
extra detail on trying do: have set of words. calculate pairwise similarity between words. , remove words not correlated other words in set.
how normalize single measure
let's consider single arbitrary similarity measure m
, take arbitrary word w
.
define m = m(w,w)
. m takes maximum possible value of m
.
let's define mn
normalized measure m
.
for 2 words w, u
can compute mn(w, u) = m(w, u) / m
.
it's easy see if m
takes non-negative values, mn
takes values in [0, 1]
.
how normalize measure combined many measures
in order compute own defined measure f
combined of k different measures m_1, m_2, ..., m_k
first normalize independently each m_i
using above method , define:
alpha_1, alpha_2, ..., alpha_k
such alpha_i
denotes weight of i-th measure.
all alphas must sum 1, i.e:
alpha_1 + alpha_2 + ... + alpha_k = 1
then compute own measure w, u
do:
f(w, u) = alpha_1 * m_1(w, u) + alpha_2 * m_2(w, u) + ... + alpha_k * m_k(w, u)
it's clear f
takes values in [0,1]
Comments
Post a Comment