2021 Croucher Summer Course in Information Theory, The Chinese University of Hong Kong
A main barrier in sharing data between people and organizations is legitimate concerns about privacy. To address this concern, an active research area has focused on designing data perturbation mechanisms that can maintain usefulness of shared data for a given analytical task, while minimizing the capability of the data analyst in inferring sensitive information. However, there is often a fundamental and nontrivial trade-off between data privacy and utility. This lecture aims to provide a thorough understanding of this trade-off and optimal ways to deal with it. The primary focus will be on information theoretic aspects of this problem. We will explore questions such as how to meaningfully and operationally measure privacy leakage using mutual information and its various generalizations in the literature, such as Sibson mutual information and maximal leakage. We will formulate the problem of privacy-utility trade-off and explore theoretical bounds on optimal solutions, as well as novel privacy-preserving algorithms to achieve them. We will also present new connections between information-theoretic privacy and other privacy-preserving frameworks, most notably differential privacy, identifiability, and low-influence. The lecture aims to provide many illustrative and interactive examples to establish key concepts and discuss practical applications of data perturbation for discrete-valued queries on datasets (such as counting or voting) and statistical queries on datasets (such as mean or quantiles).