Bits Magazine ITSOC

Survey of Grammar-Based Data Structure Compression


A data string can be represented with the help of context-free grammar such that the string is the unique string belonging to the language of the grammar. One can then losslessly compress the string indirectly by encoding the grammar into a unique binary codeword. This approach to data compression, called grammar-based data compression, can also be employed to losslessly compress graphical data structures, which are graphs in which every vertex carries a data label. Under mild restrictions, grammar-based data compression schemes are universal compressors, meaning that they perform at least as well as any finite-state compression scheme. Some of the theory of universal grammar-based compressors is surveyed. Applications of grammar-based compressors to various areas, such as bioinformatics and data networks, are discussed. Future directions for grammar-based compression research are outlined, including compression issues arising in highly repetitive databases and issues concerning the compression of sparse graphical data.

John C. Kieffer
University of Minnesota, Minneapolis, MN, USA
En-hui Yang
University of Waterloo, Waterloo, ON, Canada

Related Articles