16
2

Scaling Gaussian Processes for Learning Curve Prediction via Latent Kronecker Structure

Abstract

A key task in AutoML is to model learning curves of machine learning models jointly as a function of model hyper-parameters and training progression. While Gaussian processes (GPs) are suitable for this task, na\"ive GPs require O(n3m3)\mathcal{O}(n^3m^3) time and O(n2m2)\mathcal{O}(n^2 m^2) space for nn hyper-parameter configurations and O(m)\mathcal{O}(m) learning curve observations per hyper-parameter. Efficient inference via Kronecker structure is typically incompatible with early-stopping due to missing learning curve values. We impose latent Kronecker structure\textit{latent Kronecker structure} to leverage efficient product kernels while handling missing values. In particular, we interpret the joint covariance matrix of observed values as the projection of a latent Kronecker product. Combined with iterative linear solvers and structured matrix-vector multiplication, our method only requires O(n3+m3)\mathcal{O}(n^3 + m^3) time and O(n2+m2)\mathcal{O}(n^2 + m^2) space. We show that our GP model can match the performance of a Transformer on a learning curve prediction task.

View on arXiv
Comments on this paper