18
4

Transfer Learning Beyond Bounded Density Ratios

Abstract

We study the fundamental problem of transfer learning where a learning algorithm collects data from some source distribution PP but needs to perform well with respect to a different target distribution QQ. A standard change of measure argument implies that transfer learning happens when the density ratio dQ/dPdQ/dP is bounded. Yet, prior thought-provoking works by Kpotufe and Martinet (COLT, 2018) and Hanneke and Kpotufe (NeurIPS, 2019) demonstrate cases where the ratio dQ/dPdQ/dP is unbounded, but transfer learning is possible. In this work, we focus on transfer learning over the class of low-degree polynomial estimators. Our main result is a general transfer inequality over the domain Rn\mathbb{R}^n, proving that non-trivial transfer learning for low-degree polynomials is possible under very mild assumptions, going well beyond the classical assumption that dQ/dPdQ/dP is bounded. For instance, it always applies if QQ is a log-concave measure and the inverse ratio dP/dQdP/dQ is bounded. To demonstrate the applicability of our inequality, we obtain new results in the settings of: (1) the classical truncated regression setting, where dQ/dPdQ/dP equals infinity, and (2) the more recent out-of-distribution generalization setting for in-context learning linear functions with transformers. We also provide a discrete analogue of our transfer inequality on the Boolean Hypercube {1,1}n\{-1,1\}^n, and study its connections with the recent problem of Generalization on the Unseen of Abbe, Bengio, Lotfi and Rizk (ICML, 2023). Our main conceptual contribution is that the maximum influence of the error of the estimator f^f\widehat{f}-f^* under QQ, Imax(f^f)\mathrm{I}_{\max}(\widehat{f}-f^*), acts as a sufficient condition for transferability; when Imax(f^f)\mathrm{I}_{\max}(\widehat{f}-f^*) is appropriately bounded, transfer is possible over the Boolean domain.

View on arXiv
Comments on this paper