We start this subsection by taking a look at an example.
Example:Suppose that
is the model of the actual source.
The depth D of our context tree is 3.
Then for the leaves of this model we can lower bound
the weighted probabilities
by the
-terms, i.e.
while for the internal nodes we use the
-term as lower bound, hence
From the example we may conclude that we loose a factor 1/2 in all
leaves and
internal nodes of the actual model, hence
Using (8) we find that
which is
bits more than the bound in (14), where the
model was known.
Therefore also the bound on the individual redundancy is
bits larger than the bound in (15), i.e.:
The increase of
bits can be considered as the cost of not
knowing the model, i.e. the model redundancy.
Note that (26) holds for all
and all
for
for all models
that fit in our context
tree.