next up previous
Next: Iterative scalingEM algorithm Up: Information theoretic methods in Previous: Early results

Hypothesis testing: exponential rate optimal tests

Let tex2html_wrap_inline510 be a finite set and P a given PD on tex2html_wrap_inline510 . Suppose the null-hypothesis that an i.i.d. sample of size n comes from P is to be tested; the alternative hypothesis Q is not specified.

Then the test that accepts the null-hypothesis when the empirical distribution tex2html_wrap_inline648 of the sample belongs to the divergence ball tex2html_wrap_inline988 , is universally exponential rate optimal in the following sense.

The probability of type 1 error goes to 0 exponentially, with exponent a, and for any alternative hypothesis Q such that b(P,Q,a) below is positive, the probability of type 2 error goes to zero with exponent

  equation218

Even if the alternative hypothesis Q were specified, no tests with type 1 error probability exponent tex2html_wrap_inline1000 could have type 2 error probability that decreases with a larger exponent than b(P,Q,a); in particular, against an alternative Q with b(P,Q,a)=0, an exponential decrease of type 2 error is not achievable.

Also of interest is the modification of the above test replacing the constant a by a sequence tex2html_wrap_inline1010 such that tex2html_wrap_inline1012 . Then the type 1 error probability still goes to 0 (though no longer exponentially), and the probability of type 2 error against any alternative Q goes to zero with exponent tex2html_wrap_inline534 . The latter is best possible, by Stein's lemma.

The above results are due to Hoeffding (1965). For today's information theorists, their proof is an easy exercise in the method of types (cf. Csiszár and Körner 1981, p.44).

The extension to testing composite hypotheses is straightforward. To test the null-hypothesis that the true distribution belongs to a given set tex2html_wrap_inline558 of PD's on tex2html_wrap_inline510 , take the union of the divergence balls B(P,a), tex2html_wrap_inline562 , and accept the null-hypothesis when tex2html_wrap_inline648 is in that union. Then the type 1 error probability still goes to 0 with exponent a, and for any (simple) alternative Q, the type 2 error probability exponent will be the infimum of b(P,Q,a) subject to tex2html_wrap_inline562 , the best possible.

Notice that the acceptance criterion tex2html_wrap_inline1040 , i.e., tex2html_wrap_inline1042 , is equivalent to

displaymath972

Thus the above universally rate optimal tests are what statisticians call likelihood ratio tests.

Consider next hypothesis testing for PD's on an arbitrary set tex2html_wrap_inline510 (equipped with a tex2html_wrap_inline580 -algebra tex2html_wrap_inline518 ). Then the previous acceptance criterion does not make sense, as for a continuous distribution P always tex2html_wrap_inline1052 . One way out is to consider a refining sequence of partitions tex2html_wrap_inline1054 of tex2html_wrap_inline510 that generates tex2html_wrap_inline518 , and accept the null-hypothesis that the true distribution belongs to a given set tex2html_wrap_inline558 of PD's when tex2html_wrap_inline1062 . Assuming that m(n)=o(n) when the number tex2html_wrap_inline1066 of n-types for alphabet size m(n) is tex2html_wrap_inline1072 , the type 1 error probability of this test is still tex2html_wrap_inline1074 , while the type 2 error probability (against any given Q) will be tex2html_wrap_inline1078 where

  equation239

This approach is due to Tusnády 1977. He showed under a compactness assumption on tex2html_wrap_inline558 that tex2html_wrap_inline1082 in (3.3) approaches tex2html_wrap_inline1084 , and then the above test is universally exponential rate optimal. In particular, rate optimality always holds when the null-hypothesis is simple.

Notice the relationship of the results in this Subsection to those in Subsection 2.2.


next up previous
Next: Iterative scalingEM algorithm Up: Information theoretic methods in Previous: Early results

Ramesh Rao
Mon Apr 6 16:41:42 PDT 1998