Let
be a finite set and P a given PD on
.
Suppose the null-hypothesis that an i.i.d. sample of size
n comes from P is to be tested; the alternative
hypothesis Q is not specified.
Then the test that accepts the null-hypothesis when
the empirical distribution
of the sample
belongs to the divergence ball
, is universally
exponential rate optimal in the following sense.
The probability of type 1 error goes to 0 exponentially, with exponent a, and for any alternative hypothesis Q such that b(P,Q,a) below is positive, the probability of type 2 error goes to zero with exponent
Even if the alternative hypothesis Q were specified,
no tests with type 1 error probability exponent
could have type 2 error probability that decreases with
a larger exponent than b(P,Q,a); in particular,
against an alternative Q with b(P,Q,a)=0,
an exponential decrease of type 2 error is not achievable.
Also of interest is the modification of the above
test replacing the constant a by a sequence
such that
. Then the type 1
error probability still goes to 0 (though no longer
exponentially), and the probability of type 2 error
against any alternative Q goes to zero with exponent
. The latter is best possible, by Stein's
lemma.
The above results are due to Hoeffding (1965). For today's information theorists, their proof is an easy exercise in the method of types (cf. Csiszár and Körner 1981, p.44).
The extension to testing composite hypotheses
is straightforward. To test the null-hypothesis
that the true distribution belongs to a given
set
of PD's on
, take the union
of the divergence balls B(P,a),
, and
accept the null-hypothesis when
is
in that union. Then the type 1 error probability still
goes to 0 with exponent a, and for any (simple)
alternative Q, the type 2 error probability exponent
will be the infimum of b(P,Q,a) subject to
,
the best possible.
Notice that the acceptance criterion
, i.e.,
, is equivalent to
Thus the above universally rate optimal tests are what statisticians call likelihood ratio tests.
Consider next hypothesis testing for PD's on an arbitrary
set
(equipped with a
-algebra
).
Then the previous acceptance criterion does not make
sense, as for a continuous distribution P always
. One way out is to consider
a refining sequence of partitions
of
that generates
, and accept the null-hypothesis that the true
distribution belongs to a given set
of PD's
when
. Assuming that
m(n)=o(n) when the number
of
n-types for alphabet size m(n) is
, the type 1 error probability of this test
is still
,
while the type 2 error probability (against any given Q)
will be
where
This approach is due to Tusnády 1977. He showed under
a compactness assumption on
that
in (3.3)
approaches
, and then the above
test is universally exponential rate optimal. In
particular, rate optimality always holds when the
null-hypothesis is simple.
Notice the relationship of the results in this Subsection to those in Subsection 2.2.