The Spectral Dimension of NTKs is Constant: A Theory of Implicit Regularization, Finite-Width Stability, and Scalable Estimation
Modern deep networks are heavily overparameterized yet often generalize well, suggesting a form of low intrinsic complexity not reflected by parameter counts. We study this complexity at initialization through the effective rank of the Neural Tangent Kernel (NTK) Gram matrix, . For i.i.d. data and the infinite-width NTK , we prove a constant-limit law , with sub-Gaussian concentration. We further establish finite-width stability: if the finite-width NTK deviates in operator norm by (width ), then changes by . We design a scalable estimator using random output probes and a CountSketch of parameter Jacobians and prove conditional unbiasedness and consistency with explicit variance bounds. On CIFAR-10 with ResNet-20/56 (widths 16/32) across , we observe and slopes in , consistent with the theory, and the kernel-moment prediction closely matches fitted constants.
View on arXiv