Probabilistic interpretation of Itakura Saito distance

According to Wikipedia, The Itakura–Saito distance (or Itakura–Saito divergence) (IS-distance) is a measure of the difference between an original spectrum P(\omega) and an approximation \hat{P}(\omega ) of that spectrum. It was proposed by Fumitada Itakura and Shuzo Saito in the 1960s while they were with NTT.[1]

This is the distance used in several spectral estimation methods. Especially in speech processing this distance is widely used. Apart from just understanding as distance measure, It is interesting to understand the probabilistic interpretation. Which gives the understanding the optimal of the distance in statistical sense.

Let y is observed variable, which is the noise version of the actual hidden variable x and the w is the noise. The y is generated from x in the following manner (multiplicative noise) [‘.’ indicates the multiplication]

y=x.w

where is w is distributed as Gamma distribution with parameters (\alpha, \beta. Hence the maximum likelihood  estimate (MLE) of the x given y is

w=\frac{y}{x}

 p(w)=c w^{\alpha-1} e^{-\beta w}, where c is a constant

substitute w and take log on both sides

 log(p(w))=log(c)+({\alpha-1})log(\frac{y}{x})-\beta\frac{y}{x}

The MLE of x given y is

x^{*}=argmin_x -log(p(w))

x^{*}=argmin_x  -({\alpha-1})log(\frac{y}{x})+\beta\frac{y}{x}, ignore the constant. If \alpha=2, \beta=-1, it reduces to IS-distance.

x^{*}=argmin_x  -log(\frac{y}{x})+\frac{y}{x},

Normally -1 is also added to make sure that the distance is zero when x=y.

So IS distance is optimum estimator if the actual value is multiplicative corrupted by gamma noise with \alpha=2, \beta=-1. So different \alpha, \beta results in slightly different variant of the distance.

 

Leave a comment