next up previous
Next: Applications of IT in Up: Introduction Previous: Introduction

Preliminaries on I-divergence

The I-divergence (information divergence, also called relative entropy or Kullback-Leibler distance) of probability distributions (PD's) P,Q on a finite set tex2html_wrap_inline510 is defined as

  equation29

(in this paper, we use natural logarithms). The I-divergence of PD's on an arbitrary measurable space tex2html_wrap_inline512 , i.e., of probability measures P,Q on tex2html_wrap_inline512 , is defined as

  equation35

the sup taken for all tex2html_wrap_inline518 -measurable partitions tex2html_wrap_inline520 of tex2html_wrap_inline510 . Here tex2html_wrap_inline524 denotes the tex2html_wrap_inline526 -quantization of P defined as the PD tex2html_wrap_inline530 on tex2html_wrap_inline532 . A well known integral formula for tex2html_wrap_inline534 is

  equation38

where p(x) and q(x) are the densities of P and Q with respect to an arbitrary dominating measure tex2html_wrap_inline544 .

I-divergence is a (non-symmetric) information theoretic measure of distance of P from Q. A key property is that tex2html_wrap_inline550 , with equality iff P=Q. A stronger property known as Pinsker's inequality is

  equation43

where

  equation47

is the variation distance of P and Q.

While not a true metric, I-divergence is in many respects an analogue of squared Euclidean distance. In particular, if tex2html_wrap_inline558 is a convex set of PD's and the minimum of tex2html_wrap_inline534 subject to tex2html_wrap_inline562 is attained then the minimizer tex2html_wrap_inline564 , called the I-projection of Q onto tex2html_wrap_inline558 , is unique and

  equation50

(Csiszár 1975). If tex2html_wrap_inline558 is defined by a finite number of linear constraints then (1.6) holds with equality. This is an analogue of the Pythagorean theorem, while (1.6) is an analogue of the cosine theorem in Euclidean geometry.



Ramesh Rao
Mon Apr 6 16:41:42 PDT 1998