279
v1v2 (latest)

A Random Matrix Approach to Neural Networks

Abstract

This article studies the Gram random matrix model G=1TΣTΣG=\frac1T\Sigma^{\rm T}\Sigma, Σ=σ(WX)\Sigma=\sigma(WX), classically found in the analysis of random feature maps and random neural networks, where X=[x1,,xT]Rp×TX=[x_1,\ldots,x_T]\in{\mathbb R}^{p\times T} is a (data) matrix of bounded norm, WRn×pW\in{\mathbb R}^{n\times p} is a matrix of independent zero-mean unit variance entries, and σ:RR\sigma:{\mathbb R}\to{\mathbb R} is a Lipschitz continuous (activation) function --- σ(WX)\sigma(WX) being understood entry-wise. By means of a key concentration of measure lemma arising from non-asymptotic random matrix arguments, we prove that, as n,p,Tn,p,T grow large at the same rate, the resolvent Q=(G+γIT)1Q=(G+\gamma I_T)^{-1}, for γ>0\gamma>0, has a similar behavior as that met in sample covariance matrix models, involving notably the moment Φ=TnE[G]\Phi=\frac{T}n{\mathbb E}[G], which provides in passing a deterministic equivalent for the empirical spectral measure of GG. Application-wise, this result enables the estimation of the asymptotic performance of single-layer random neural networks. This in turn provides practical insights into the underlying mechanisms into play in random neural networks, entailing several unexpected consequences, as well as a fast practical means to tune the network hyperparameters.

View on arXiv
Comments on this paper