v1v2v3v4 (latest)

Source-Optimal Training is Transfer-Suboptimal

11 November 2025

C. Evans Hedges

ArXiv (abs)PDF HTML

Main:10 Pages

3 Figures

Bibliography:2 Pages

Appendix:5 Pages

Abstract

We prove that training a source model optimally for its own task is generically suboptimal when the objective is downstream transfer. We study the source-side optimization problem in L2-SP ridge regression and show a fundamental mismatch between the source-optimal and transfer-optimal source regularization: outside of a measure-zero set, $\tau_0^* \neq \tau_S^*$ . We characterize the transfer-optimal source penalty $\tau_0^*$ as a function of task alignment and identify an alignment-dependent reversal: with imperfect alignment ( $0<\rho<1$ ), transfer benefits from stronger source regularization, while in super-aligned regimes ( $\rho>1$ ), transfer benefits from weaker regularization. Additionally, in isotropic settings, the decision of whether transfer helps is independent of the target sample size and noise, depending only on task alignment and source characteristics. We verify the linear predictions in a synthetic ridge regression experiment, and we present experiments on MNIST, CIFAR-10, and 20 Newsgroups as evidence that the source-optimal versus transfer-optimal mismatch persists in standard nonlinear transfer learning pipelines.

View on arXiv

Comments on this paper