Universal hypothesis testing in the learning-limited regime
Benjamin Kelly Thitidej Tularak Aaron B. Wagner Pramod Viswanath
Proceedings of the IEEE International Symposium on Information Theory, Austin, TX, USA, June 2010

Given training sequences generated by two distinct, but unknown distributions sharing a common alphabet, we seek a classifier that can correctly decide whether a third test sequence is generated by the first or second distribution using only the training data. To model `limited learning' we allow the alphabet size to grow and therefore probability distributions to change with the blocklength. We prove that a natural choice, namely a generalized likelihood ratio test, is universally consistent (has a probability of error tending to zero with the blocklength for all underlying distributions) when the alphabet size is sub-linear in the blocklength, but inconsistent for linear alphabet growth. For up-to quadratic alphabet growth, in a regime where all probabilities are of the same order, we prove the universally consistency of a new test and show there are no such tests when the alphabet grows quadratically or faster.