312

On approximating f\nabla f with neural networks

Abstract

Consider a feedforward neural network ψ:RdRd\psi: \mathbb{R}^d\rightarrow \mathbb{R}^d such that ψf\psi\approx \nabla f, where f:RdRf:\mathbb{R}^d \rightarrow \mathbb{R} is a smooth function, therefore ψ\psi must satisfy jψi=iψj\partial_j \psi_i = \partial_i \psi_j pointwise. We prove a theorem that for any such ψ\psi networks, and for any depth L>2L>2, all the input weights must be parallel to each other. In other words, ψ\psi can only represent oneone feature in its first hidden layer. The proof of the theorem is straightforward, where two backward paths (from ii to jj and jj to ii) and a weight-tying matrix (connecting the last and first hidden layers) play the key roles. We thus make a strong theoretical case in favor of the implicitimplicit parametrization, where the neural network is ϕ:RdR\phi: \mathbb{R}^d \rightarrow \mathbb{R} and ϕf\nabla \phi \approx \nabla f. Throughout, we revisit two recent unnormalized probabilistic models that are formulated as ψf\psi \approx \nabla f and also discuss the denoising autoencoders in the end.

View on arXiv
Comments on this paper