LO: Compute-Efficient Meta-Generalization of Learned Optimizers
- AI4CE
Learned optimizers (LOs) can significantly reduce the wall-clock training time of neural networks, substantially reducing training costs. However, they can struggle to optimize unseen tasks (meta-generalize), especially when training networks wider than those seen during meta-training. To address this, we derive the Maximal Update Parametrization (P) for two state-of-the-art learned optimizer architectures and propose a simple meta-training recipe for -parameterized LOs (LOs). Our empirical evaluation demonstrates that LOs meta-trained with our recipe substantially improve meta-generalization to wider unseen tasks when compared to LOs trained under standard parametrization (SP), as they are trained in existing work. We also empirically observe that LOs trained with our recipe exhibit unexpectedly improved meta-generalization to deeper networks ( meta-training) and surprising generalization to much longer training horizons ( meta-training) when compared to SP LOs.
View on arXiv