By applying the Huffman coding algorithm [1], we have
Let
Then
Here again,
Thus, by conditioning on the past K letters one may achieve a better compression,
if a different Huffman code-book is used for each
, namely
In many cases, even the
-th order statistics of the source are not available. This situation
calls for universal data-compression algorithms, that utilize a ``training-sequence"
of limited length which was either emitted by the same source, or is a curtailed, partial description
of the statistics of the source.
The first introductory case to be considered is where a sequence of n-letters is to be compressed.
Let
and let
(Empirical prob. measure).
Now, by the convexity of the entropy function
Now [2, 3], for small enough
(as compared with n) one can empirically evaluate
for each
vector and then apply the appropriate Huffman algorithm
for consecutive
-blocks. (The description of
takes about
bits). The length function may therefore be upper-bounded by:
where:
However, in this introductory case
the compression
is achieved at the cost of a coding delay
of n letters. We now introduce a more fruitful approach: