Video file
Bias And Unfairness In Data From People: Challenges, Models, Solutions, And Open Problems
Presenter Profile Picture
Carnegie Mellon University (Machine Learning and Computer Science Departments)

ISIT 2021, Melbourne, Victoria, Australia



A large number of applications rely on data obtained from people such as hiring, admissions, crowdsourcing, A/B testing, product ratings and recommendations, peer grading, and peer review. Data from people contains a number of issues such as biases, subjectivity, miscalibration, noise, and fraud. These issues lead to unfairness and inefficiencies in these applications, and are further amplified when this data is used to train ML/AI algorithms for downstream tasks.

In this tutorial, using peer review as a running example, we will discuss these challenges in human-provided data. We will first present insightful experiments, including analyses of thousands of submissions and reviews in machine learning conferences such as ICML and NeurIPS. We will then present mathematical models for these issues, and proposed solutions with theoretical guarantees. Finally, we will highlight important open problems -- this is a wide open field where the information-theoretic perspective can contribute considerably in establishing the fundamental limits on these problems, as well as in designing algorithms that can achieve these limits and make a significant real-world impact. No prior background will be assumed.

His research interests include statistics, machine learning, information theory, and game theory, with a focus on learning from people. He is a recipient of an NSF CAREER Award 2020-25, the 2017 David J. Sakrison memorial prize from EECS Berkeley for a "truly outstanding and innovative PhD thesis", the Microsoft Research PhD Fellowship 2014-16, the Berkeley Fellowship 2011-13, the IEEE Data Storage Best Paper and Best Student Paper Awards for the years 2011/2012, and the SVC Aiya Medal 2010, and has supervised the Best Student Paper at AAMAS 2019.