198
v1v2v3v4v5 (latest)

Renormalizable Spectral-Shell Dynamics as the Origin of Neural Scaling Laws

Yizhou Zhang
Main:29 Pages
3 Figures
Bibliography:5 Pages
Appendix:3 Pages
Abstract

Neural scaling laws and double-descent phenomena suggest that deep-network training obeys a simple macroscopic structure despite highly nonlinear optimization dynamics. We derive such structure directly from gradient descent in function space. For mean-squared error loss, the training error evolves as e˙t=M(t)et\dot e_t=-M(t)e_t with M(t)=Jθ(t)Jθ(t) ⁣M(t)=J_{\theta(t)}J_{\theta(t)}^{\!*}, a time-dependent self-adjoint operator induced by the network Jacobian. Using Kato perturbation theory, we obtain an exact system of coupled modewise ODEs in the instantaneous eigenbasis of M(t)M(t).To extract macroscopic behavior, we introduce a logarithmic spectral-shell coarse-graining and track quadratic error energy across shells. Microscopic interactions within each shell cancel identically at the energy level, so shell energies evolve only through dissipation and external inter-shell interactions. We formalize this via a \emph{renormalizable shell-dynamics} assumption, under which cumulative microscopic effects reduce to a controlled net flux across shell boundaries.Assuming an effective power-law spectral transport in a relevant resolution range, the shell dynamics admits a self-similar solution with a moving resolution frontier and explicit scaling exponents. This framework explains neural scaling laws and double descent, and unifies lazy (NTK-like) training and feature learning as two limits of the same spectral-shell dynamics.

View on arXiv
Comments on this paper