96
v1v2v3 (latest)

Beyond Discretization: Learning the Optimal Solution Path

International Conference on Artificial Intelligence and Statistics (AISTATS), 2024
Main:9 Pages
9 Figures
Bibliography:1 Pages
1 Tables
Appendix:12 Pages
Abstract

Many applications require minimizing a family of optimization problems indexed by some hyperparameter λΛ\lambda \in \Lambda to obtain an entire solution path. Traditional approaches proceed by discretizing Λ\Lambda and solving a series of optimization problems. We propose an alternative approach that parameterizes the solution path with a set of basis functions and solves a \emph{single} stochastic optimization problem to learn the entire solution path. Our method offers substantial complexity improvements over discretization. When using constant-step size SGD, the uniform error of our learned solution path relative to the true path exhibits linear convergence to a constant related to the expressiveness of the basis. When the true solution path lies in the span of the basis, this constant is zero. We also prove stronger results for special cases common in machine learning: When λ[1,1]\lambda \in [-1, 1] and the solution path is ν\nu-times differentiable, constant step-size SGD learns a path with ϵ\epsilon uniform error after at most O(ϵ11νlog(1/ϵ))O(\epsilon^{\frac{1}{1-\nu}} \log(1/\epsilon)) iterations, and when the solution path is analytic, it only requires O(log2(1/ϵ)loglog(1/ϵ))O\left(\log^2(1/\epsilon)\log\log(1/\epsilon)\right). By comparison, the best-known discretization schemes in these settings require at least O(ϵ1/2)O(\epsilon^{-1/2}) discretization points (and even more gradient calls). Finally, we propose an adaptive variant of our method that sequentially adds basis functions and demonstrates strong numerical performance through experiments.

View on arXiv
Comments on this paper