Relative Flatness and Generalization in the Interpolation Regime
Traditional generalization bounds are based on analyzing the limits of the model capacity. Therefore, they become vacuous in the \emph{interpolation} (over-parameterized) regime of modern machine learning models where training data can be fitted perfectly. This paper proposes a new approach to meaningful generalization bounds in the interpolation regime by decomposing the generalization gap into a notion of \emph{representativeness} and \emph{feature robustness}. Representativeness captures properties of the data distribution and mitigates the dependence on the data dimension by exploiting the low-dimensional feature representation used implicitly by the model, and feature robustness captures the expected change in loss resulting from perturbations of these implicit features. We show that feature robustness can be bounded by a relative flatness measure of the empirical loss surface for models that locally minimize the training loss. This yields an algorithm-agnostic bound potentially explaining the abundance of empirical observations that flatness of the loss surface is correlated with generalization.
View on arXiv