IEEE ITW 2020, Riva del Garda, Italy
Deep learning models are often so complex that they achieve vanishing classification error on the training set. Despite their huge complexity, the same architectures achieve small generalization error. This phenomenon has been rationalized in terms of a so-called double descent curve. As the model complexity increases, the generalization error follows the usual U-shaped curve at the beginning, first decreasing and then peaking around the interpolation threshold (when the model achieves vanishing training error). However, it descends again as model complexity exceeds this threshold. I will focus on the case of a fully-connected two-layers neural network, and consider its linearization around a random initial condition. I will show that many intersting phenomena can be demonstrated and mathematically understood in this simple setting. I will then describe a few open problems and directions for future research.