On approximating with neural networks
- FedML
Consider a feedforward neural network such that , where is a smooth function, therefore must satisfy pointwise. We prove a theorem that for any such networks, and for any depth , all the input weights must be parallel to each other. In other words, can only represent feature in its first hidden layer. The proof of the theorem is straightforward, where two backward paths (from to and to ) and a weight-tying matrix (connecting the last and first hidden layers) play the key roles. We thus make a strong theoretical case in favor of the parametrization, where the neural network is and . Throughout, we revisit two recent unnormalized probabilistic models that are formulated as and also discuss the denoising autoencoders in the end.
View on arXiv