Suppose that
is a good coding distribution for
source 1 and
for source 2.
Then the weighted distribution
is a good coding distribution for both source 1 and 2.
Proof:Let
, then
So the bound on the codeword length increases (see (8)) by
1 bit.
In practice the increase is far less, especially if
and
are approximately equal.
Note that, if after observing
we select the i that
minimizes
, we loose exactly 1 bit.
This bit is now needed to specify the source index.
Example:Suppose sources 1 and 2 are memoryless with parameters
and
.
Then
,
, and
.
Hence
which is
close to
. Similarly,
is close to
.