Statistics, the science of extracting information from data, appears the most natural field of applications of IT, besides communication theory. Historically, an information measure had been used by statisticians prior to Shannon (Fisher's information, Fisher 1925). I-divergence was first explicitly introduced for purposes of statistics, though motivated by Shannon's work (Kullback and Leibler 1951). Implicitly it had played a role also in earlier statistical works (Wald 1947, Good 1950), and Kullback soon developed a unified approach to testing statistical hypotheses based on this information measure (Kullback 1959).
Several results considered in retrospect as applications of IT in statistics were actually established by statisticians independently of IT. ``Although Wald did not explicitly mention information in his treatment of sequential analysis, it should be noted that his work must be considered a major contribution to the statistical applications of IT'' (Kullback 1959, p.2). This author shares this view, and he also considers the results in Subsection 3.2 below as applications of a typical IT tool, viz. the method of types. The proof of these results, however, preceded the development of the method of types in IT; indeed, it represented one of the origins of that method. Some would prefer to speak in this context about interplay of statistics and IT rather than statistical applications of IT.
There are two major inference methods motivated by IT: The methods of maximizing entropy (or minimizing I-divergence) and of minimizing ``description length.'' Their coverage is impossible here, for lack of space, but perhaps not necessary, either, since most information theorists have at least some familiarity with these methods. We will but illustrate them by simple examples.