Let us start with Wald's inequality relating the expected sample size of a sequential test to the type 1 and type 2 error probabilities.
Assuming i.i.d. sampling from a distribution
known to be either P or Q , a sequential test
accepts one of these hypotheses on the basis of a
sample
of random length N. Here
N is a stopping time, i.e., knowledge of
determines whether or not N=n. Wald (1947)
proved that
where
is the probability under P of accepting
Q and
is the probability under Q of accepting
P. Moreover, Wald showed that his sequential probability
ratio test nearly attains the equality in (3.1).
The IT interpretation makes this result
easy to understand: Denoting by
and
the
distribution of
under P and Q, the left hand
side of (3.1) equals
(this can be checked
using Wald's identity), whereas the right hand side
is the I-divergence of the
-quantizations of
and
for
,
and
being the
acceptance regions of P and Q. Were the likelihood
ratio constant on both
and
, the equality
would hold in (3.1). While no test can achieve this
exactly, in general, the sequential probability ratio test
comes close.
Another early result in statistical IT is the celebrated
``Stein's lemma'' (Chernoff 1952; Stein apparently
disowns it). It provides an operational meaning to
I-divergence: For testing a simple hypothesis P against
a simple alternative Q, the best test of sample size n
and type 1 error probability
(for any
has type 2 error probability
. Notice that if the type 1
error were required to go to zero, rather than just
, the special case
of Wald's inequality (3.1) would already imply that
the type 2 error probability exponent can not
exceed
.