IEEE ITW 2020, Riva del Garda, Italy
The exploration-exploitation dilemma is a central challenge when making decisions under uncertainty. Most common approaches explore by favouring actions with uncertain outcomes. However, aleatoric uncertainty in the outcomes is different from epistemic uncertainty in the estimation task, thus the resulting observations may not necessarily be informative. In this talk, I will present approaches towards efficient information-directed exploration in stochastic multi-armed bandits, Bayesian optimization, reinforcement learning and a rich family of sequential decision problems called partial monitoring. These approaches use information measures for guiding exploration, and their submodularity allows to establish sublinear regret even in non-parametric settings. I will present the theoretical background, as well as empirical demonstrations on deep reinforcement learning tasks.