IEEE ITW 2020, Riva del Garda, Italy
Data driven algorithm design for combinatorial problems is an important aspect of modern data science. Rather than using off the shelf algorithms that only have worst case performance guarantees, practitioners typically optimize over large families of parametrized algorithms and tune the parameters of these algorithms using a training set of problem instances from their domain to determine a configuration with high expected performance over future instances. However, most of this work comes with no performance guarantees. The challenge is that for many combinatorial problems, including partitioning and subset selection problems, a small tweak to the parameters can cause a cascade of changes in the algorithm's behavior, so the algorithm's performance is a discontinuous function of its parameters. In this talk, I will present new work that helps put data driven combinatorial algorithm selection on firm foundations. This includes strong computational and statistical performance guarantees, both for the batch and online scenarios where a collection of typical problem instances from the given application are presented either all at once or in an online fashion, respectively. I will describe both specific examples (for clustering, partitioning, and subset selection problems) and general principles that emerge in this context (including general techniques for sample complexity guarantees in the batch setting and no-regret guarantees in the online settings).