168

Time-Scale Coupling Between States and Parameters in Recurrent Neural Networks

Main:9 Pages
16 Figures
Bibliography:1 Pages
Appendix:13 Pages
Abstract

We study how gating mechanisms in recurrent neural networks (RNNs) implicitly induce adaptive learning-rate behavior, even when training is carried out with a fixed, global learning rate. This effect arises from the coupling between state-space time scales--parametrized by the gates--and parameter-space dynamics during gradient descent. By deriving exact Jacobians for leaky-integrator and gated RNNs, we obtain a first-order expansion that makes explicit how constant, scalar, and multi-dimensional gates reshape gradient propagation, modulate effective step sizes, and introduce anisotropy in parameter updates. These findings reveal that gates not only control memory retention in the hidden states, but also act as data-driven preconditioners that adapt optimization trajectories in parameter space. We further draw formal analogies with learning-rate schedules, momentum, and adaptive methods such as Adam, showing that these optimization behaviors emerge naturally from gating. Numerical experiments confirm the validity of our perturbative analysis, supporting the view that gate-induced corrections remain small while exerting systematic effects on training dynamics. Overall, this work provides a unified dynamical-systems perspective on how gating couples state evolution with parameter updates, explaining why gated architectures achieve robust trainability and stability in practice.

View on arXiv
Comments on this paper