next up previous
Next: Text compression Up: Unknown model Previous: MDL behavior

Complexity

In section 5 we have seen that we can use arithmetic coding if, after tex2html_wrap_inline2829 is processed and tex2html_wrap_inline3237 is known, it is easy to compute tex2html_wrap_inline3239 and tex2html_wrap_inline3241 . Does this hold for the CTW algorithm? Fortunately it does! See the example below.

Example:

   figure2006
Figure 7: Updated path of the weighted context tree for 0100110 followed by a 0 with past tex2html_wrap_inline2965 .

Suppose that the source has already generated the sequence 0100110 while the past symbols were tex2html_wrap_inline2965 . This resulted in the weighted context tree in figure 6. In the root of the context tree we found the coding probability tex2html_wrap_inline3277 . For processing the next source symbol we must compute the coding probability tex2html_wrap_inline3279 . This is done by (i) incrementing tex2html_wrap_inline2991 , (ii) updating tex2html_wrap_inline3083 , and (iii) updating tex2html_wrap_inline3187 , for all contexts tex2html_wrap_inline3287 , i.e. along the path in the context tree determined by the symbols preceding the next source symbol. The results of these computations are shown in figure 7. Doing so we find tex2html_wrap_inline3289 in the root of the context tree. In a similar way tex2html_wrap_inline3291 can be determined.

We conclude by stating that the number of operations necessary for processing all T source symbols is linear in T. Moreover, since we only need to store nodes in the context tree that actually did occur, and since for each symbol we can visit not more than D+1 nodes, the amount of memory needed to compress tex2html_wrap_inline2693 grows not faster than linear in T.



Ramesh Rao
Fri May 2 08:36:00 PDT 1997