Information contraction is one of the most fundamental concepts in information theory as evidenced by the numerous classical converse theorems that utilize it. This dissertation studies several problems aimed at better understanding this notion, broadly construed, within the intertwined realms of information theory, statistics, and discrete probability theory.
In information theory, the contraction of f-divergences, such as Kullback-Leibler (KL) divergence, χ2-divergence, and total variation (TV) distance, through channels (or the contraction of mutual f -information along Markov chains) is quantitatively captured by the well-known data processing inequalities. These inequalities can be tightened to produce “strong” data processing inequalities (SDPIs), which are obtained by introducing appropriate channel-dependent or source-channel-dependent “contraction coefficients.”
The thesis first proves various properties of contraction coefficients of source-channel pairs, and derives linear bounds on specific classes of such contraction coefficients in terms of the contraction coefficient for χ2-divergence (or the Hirschfeld-Gebelein-Rényi maximal correlation). Next, the thesis adopts a more statistical and machine learning perspective in elucidating the elegant geometry of SDPIs for χ2-divergence by developing modal decompositions of bivariate distributions based on singular value decompositions of conditional expectation operators.