The I-divergence (information divergence, also called
relative entropy or Kullback-Leibler distance)
of probability
distributions (PD's) P,Q on a finite set
is defined
as
(in this paper, we use natural logarithms). The I-divergence
of PD's on an arbitrary measurable space
,
i.e., of probability measures P,Q on
,
is defined as
the sup taken for all
-measurable partitions
of
. Here
denotes the
-quantization of
P defined as the PD
on
. A well known integral formula for
is
where p(x) and q(x) are the densities of P and Q
with respect to an arbitrary dominating measure
.
I-divergence is a (non-symmetric) information theoretic
measure of distance of P from Q. A key property
is that
, with equality iff P=Q.
A stronger property known as Pinsker's inequality is
where
is the variation distance of P and Q.
While not a true metric, I-divergence is in many respects
an analogue of squared Euclidean distance. In particular,
if
is a convex set of PD's and the minimum of
subject to
is attained then
the minimizer
, called the I-projection of
Q onto
, is unique and
(Csiszár 1975). If
is defined by a finite
number of linear
constraints then (1.6) holds with equality.
This is an analogue of the Pythagorean theorem,
while (1.6) is an analogue of the cosine theorem
in Euclidean geometry.