We believe that context-tree weighting simplified the theory and practice of statistical data compression methods. It is important to distinguish between model and parameters and to realize that to both of them there corresponds a redundancy term. Good algorithms take care of both redundancies. The model redundancy of CTW is optimal in the rather weak sense that we can decrease the redundancy for some models only by increasing the redundancy of other models. This is a consequence of weighting. There are other weightings that result in other model redundancy profiles, however CTW has the nice property that the model redundancy is (almost) proportional to the number of parameters.
The CTW method is generally considered to be rather complex. A state-of-the-art implementation requires 32 MByte of RAM. Today this may seem a lot, however for sure, in ten years from now this is ``peanuts.'' A challenging problem is to find methods that improve the compression rate of e.g. CTW by making use of the huge amounts of memory that will be available in the future. Of particular interest are of course methods that allow parallel implementation. We hope that the mini-course presented here will be a starting point for people interested in achieving this goal.