next up previous
Next: Measure concentration Up: Applications of IT in Previous: Gaussian measures dichotomy theorem

Large deviations: Sanov's theorem, Gibbs' conditioning principle

Let tex2html_wrap_inline638 be i.i.d. random variables with values in an arbitrary set tex2html_wrap_inline510 (equipped with a tex2html_wrap_inline580 -algebra tex2html_wrap_inline518 ), with common distribution Q. The empirical distribution tex2html_wrap_inline648 of tex2html_wrap_inline650 is the random probability measure on tex2html_wrap_inline510 defined by tex2html_wrap_inline654 . Sanov's theorem (Sanov 1957) says, intuitively, that for any PD tex2html_wrap_inline656 ,

  equation78

If tex2html_wrap_inline510 is a finite set, we have more exactly

  equation82

whenever P is a possible type for block-length n (a basic fact of the method of types, cf. Csiszár and Körner 1981, p.32). Clearly, (2.2) implies that

  equation85

for any set tex2html_wrap_inline558 of PD's on tex2html_wrap_inline510 in which n-types become dense as tex2html_wrap_inline670 .

For arbitrary tex2html_wrap_inline510 , a general form of Sanov's theorem says that for any set tex2html_wrap_inline558 of PD's on tex2html_wrap_inline510 for which the probabilities tex2html_wrap_inline678 are defined, we have

  equation90

  equation95

Here tex2html_wrap_inline680 and tex2html_wrap_inline682 denote the interior and closure of tex2html_wrap_inline558 in the tex2html_wrap_inline686 -topology, i.e., the topology of setwise convergence of probability measures (the tex2html_wrap_inline686 -topology is weaker than the topology of variation distance). Of course, the equality of the right hand sides of (2.4) and (2.5) is a sufficient condition for the limit relation (2.3). In particular, (2.3) always holds if tex2html_wrap_inline558 is convex and tex2html_wrap_inline692 for some tex2html_wrap_inline694 .

Of particular interest is the choice

  equation100

where f is a given (possibly vector valued) function on tex2html_wrap_inline510 and C is a given subset of the range of f; then tex2html_wrap_inline704 means that tex2html_wrap_inline706 .

In the parlance of large deviations theory (cf. Dembo and Zeitouni 1993), the asymptotic bounds (2.4), (2.5), together with the easily checked compactness (in tex2html_wrap_inline686 -topology) of the divergence balls

  equation104

mean that tex2html_wrap_inline648 satisfies the large deviation principle, with good rate function tex2html_wrap_inline534 .

The simplest available proof of this important result is inherently information theoretic (Groeneboom, Oosterhoff and Ruymgaart 1979; they acknowledge using a suggestion of this author in proving (2.8) below). We sketch the proof of the more difficult part (2.5).

For any (measurable) partition tex2html_wrap_inline520 of tex2html_wrap_inline510 , tex2html_wrap_inline678 is bounded above by the sum of probabilities tex2html_wrap_inline720 for all PD's P' on tex2html_wrap_inline532 such that tex2html_wrap_inline726 for some tex2html_wrap_inline562 and P' is an n-type. This implies by (2.2) that

displaymath632

This gives

displaymath633

and (2.5) follows by (1.2) if one checks that

  equation111

The latter is non-trivial but not too hard.

Suppose next that tex2html_wrap_inline558 is a convex set of PD's on tex2html_wrap_inline510 and the conditional distribution of tex2html_wrap_inline738 on the condition tex2html_wrap_inline704 belongs to tex2html_wrap_inline558 . This is always the case for tex2html_wrap_inline558 given by (2.6) if C is a convex set and tex2html_wrap_inline748 exists. Under the above hypotheses, the asymptotic bound (2.5) may be sharpened to a non-asymptotic bound. More importantly, under mild additional hypotheses, a strong form of Gibbs' conditioning principle (cf. Dembo and Zeitouni 1993) may be established by a simple IT reasoning (Csiszár 1984).

The following identity

  equation118

holds for any PD tex2html_wrap_inline750 on tex2html_wrap_inline752 whose marginals on tex2html_wrap_inline510 are equal to P, and for any PD Q on tex2html_wrap_inline510 . Take here tex2html_wrap_inline762 (the conditional distribution of tex2html_wrap_inline764 on the condition tex2html_wrap_inline704 ), then tex2html_wrap_inline768 by assumption. It follows that

  equation123

proving the claimed non-asymptotic bound. If tex2html_wrap_inline564 is the I-projection of Q onto tex2html_wrap_inline558 , from (2.10) and (1.6) we further obtain

  eqnarray127

where the last step follows from (2.9) with Q replaced by tex2html_wrap_inline564 .

Supposing finally that the given tex2html_wrap_inline558 satisfies (2.3), it follows from (2.11) - recalling tex2html_wrap_inline762 - that

  equation132

This says, intuitively, that conditionally on tex2html_wrap_inline704 , the random variables tex2html_wrap_inline786 ``almost behave'' as independent ones with common distribution tex2html_wrap_inline564 . In particular, (2.12) implies that for any fixed k

displaymath634

Thus the conditional joint distribution of tex2html_wrap_inline792 on the condition tex2html_wrap_inline704 converges to the product distribution tex2html_wrap_inline796 in a stronger sense than in variation distance, cf. (1.4). This is the promised strong version of Gibbs' conditioning principle.

The above result may be extended to the case when the I-projection of Q onto tex2html_wrap_inline558 does not exist. Namely, a unique tex2html_wrap_inline564 (not necessarily in tex2html_wrap_inline558 ) always exists such that with tex2html_wrap_inline808 implies tex2html_wrap_inline810 ; then (2.12) holds with this ``generalized I-projection'' tex2html_wrap_inline564 (Csiszár 1984).


next up previous
Next: Measure concentration Up: Applications of IT in Previous: Gaussian measures dichotomy theorem

Ramesh Rao
Mon Apr 6 16:41:42 PDT 1998