By Michael Drmota, TU Wien, Austria, michael.drmota@tuwien.ac.at | Wojciech Szpankowski, Purdue University, USA, spa@cs.purdue.edu
Lossless data compression is a facet of source coding and a well studied problem of information theory. Its goal is to find a shortest possible code that can be unambiguously recovered. Here, we focus on rigorous analysis of code redundancy for known sources. The redundancy rate problem determines by how much the actual code length exceeds the optimal code length. We present precise analyses of three types of lossless data compression schemes, namely fixed-to-variable (FV) length codes, variable-to-fixed (VF) length codes, and variableto- variable (VV) length codes. In particular, we investigate the average redundancy of Shannon, Huffman, Tunstall, Khodak and Boncelet codes. These codes have succinct representations as trees, either as coding or parsing trees, and we analyze here some of their parameters (e.g., the average path from the root to a leaf). Such trees are precisely analyzed by analytic methods, known also as analytic combinatorics, in which complex analysis plays decisive role. These tools include generating functions, Mellin transform, Fourier series, saddle point method, analytic poissonization and depoissonization, Tauberian theorems, and singularity analysis. The term analytic information the- ory has been coined to describe problems of information theory studied by analytic tools. This approach lies on the crossroad of information theory, analysis of algorithms, and combinatorics.
The term analytic information theory has been coined to describe problems of information theory studied by analytic tools. The approach of applying tools from analysis of algorithms to problems of source coding and, in general, to information theory lies at the crossroad of computer science and information theory. Combining the tools from both areas often provides powerful results, such as computer scientist Abraham Lempel and information theorist Jacob Ziv working together in the late 1970s to develop compression algorithms that are now widely referred to as Lempel-Ziv algorithms and are the basis of the ZIP compression still used extensively in computing today.
This monograph surveys the use of these techniques for the rigorous analysis of code redundancy for known sources in lossless data compression. A separate chapter is devoted to precise analyses of each of three types of lossless data compression schemes, namely fixed-to-variable length codes, variable-to-fixed length codes, and variable-to-variable length codes. Each one of these schemes is described in detail, building upon work done in the latter part of the 20th century to present new and powerful techniques. For the first time, this survey presents redundancy for universal variable-to-fixed and variable-to-variable length codes in a comprehensive and coherent manner.
The monograph will be of interest to computer scientists and information theorists working on modern coding techniques. Written by two leading experts, it provides the reader with a unique, succinct starting point for their own research into the area.