ISIT 2021, Melbourne, Victoria, Australia
A large number of applications rely on data obtained from people such as hiring, admissions, crowdsourcing, A/B testing, product ratings and recommendations, peer grading, and peer review. Data from people contains a number of issues such as biases, subjectivity, miscalibration, noise, and fraud. These issues lead to unfairness and inefficiencies in these applications, and are further amplified when this data is used to train ML/AI algorithms for downstream tasks.
In this tutorial, using peer review as a running example, we will discuss these challenges in human-provided data. We will first present insightful experiments, including analyses of thousands of submissions and reviews in machine learning conferences such as ICML and NeurIPS. We will then present mathematical models for these issues, and proposed solutions with theoretical guarantees. Finally, we will highlight important open problems -- this is a wide open field where the information-theoretic perspective can contribute considerably in establishing the fundamental limits on these problems, as well as in designing algorithms that can achieve these limits and make a significant real-world impact. No prior background will be assumed.