What Do We Know About Matrix Estimation?

2018 ISIT Tutorial
What Do We Know About Matrix Estimation?
Christina Lee Yu and Devavrat Shah


The task of estimating matrix based on its noisy, partial observation has emerged as the canonical challenge across a variety of fields spanning Inference, Machine Learning and Statistics
over the past decade or so.

Popularized examples abound, including Recommendation systems, Asymptotic Graph Theory (e.g. Graphons), Network Science (e.g. Community Detection), Social Data Processing (e.g. Ranking and Crowd Sourcing), Causal Inference (e.g. Synthetic Control), Panel Data Analysis, Bio-informatics (e.g. DNA sequencing) and more.

The purpose of this tutorial is to provide a comprehensive survey of various algorithmic and analytic approaches developed over the past decade across fields of information sciences, broadly defined. The goal is to ground these developments in the context of a “universal” model through connections with the theory of exchangeability (i.e. De Finetti (1937), Aldous and Hoover (1980s)). A particular attention will be paid to statistical and computational trade-off that arise in this class of problems. Open questions pertaining to conjectured fundamental limits and mysterious empirical algorithmic successes will be discussed.

Christina Lee Yu (formerly Christina E. Lee) is an Assistant Professor at Cornell University in Operations Research and Information Engineering. Prior to Cornell, she was a postdoc at Microsoft Research New England. She received her PhD in 2017 and MS in 2013 in Electrical Engineering and Computer Science from Massachusetts Institute of Technology in the Laboratory for Information and Decision Systems. She received her BS in Computer Science from California Institute of Technology in 2011.  She received honorable mention for the 2018 INFORMS Dantzig Dissertation Award, and was a participant of the 2016 EECS Rising Stars workshop hosted by CMU.  Her research focuses on designing and analyzing scalable algorithms for processing social data based on principles from statistical inference.