In this paper we focus on the linear algebra theory behind feedforward (FNN) and recurrent (RNN) neural networks. We review backward propagation, including backward propagation through time (BPTT). Also, we obtain a new exact expression for Hessian, which represents second order effects. We show that for time steps the weight gradient can be expressed as a rank- matrix, while the weight Hessian is as a sum of Kronecker products of rank- and matrices, for some matrix and weight matrix . Also, we show that for a mini-batch of size , the weight update can be expressed as a rank- matrix. Finally, we briefly comment on the eigenvalues of the Hessian matrix.
View on arXiv