21
13

Do Subsampled Newton Methods Work for High-Dimensional Data?

Abstract

Subsampled Newton methods approximate Hessian matrices through subsampling techniques, alleviating the cost of forming Hessian matrices but using sufficient curvature information. However, previous results require Ω(d)\Omega (d) samples to approximate Hessians, where dd is the dimension of data points, making it less practically feasible for high-dimensional data. The situation is deteriorated when dd is comparably as large as the number of data points nn, which requires to take the whole dataset into account, making subsampling useless. This paper theoretically justifies the effectiveness of subsampled Newton methods on high dimensional data. Specifically, we prove only Θ~(deffγ)\widetilde{\Theta}(d^\gamma_{\rm eff}) samples are needed in the approximation of Hessian matrices, where deffγd^\gamma_{\rm eff} is the γ\gamma-ridge leverage and can be much smaller than dd as long as nγ1n\gamma \gg 1. Additionally, we extend this result so that subsampled Newton methods can work for high-dimensional data on both distributed optimization problems and non-smooth regularized problems.

View on arXiv
Comments on this paper