115

Provable Benefits of Sinusoidal Activation for Modular Addition

Main:11 Pages
16 Figures
Bibliography:5 Pages
1 Tables
Appendix:44 Pages
Abstract

This paper studies the role of activation functions in learning modular addition with two-layer neural networks. We first establish a sharp expressivity gap: sine MLPs admit width-22 exact realizations for any fixed length mm and, with bias, width-22 exact realizations uniformly over all lengths. In contrast, the width of ReLU networks must scale linearly with mm to interpolate, and they cannot simultaneously fit two lengths with different residues modulo pp. We then provide a novel Natarajan-dimension generalization bound for sine networks, yielding nearly optimal sample complexity O~(p)\widetilde{\mathcal{O}}(p) for ERM over constant-width sine networks. We also derive width-independent, margin-based generalization for sine networks in the overparametrized regime and validate it. Empirically, sine networks generalize consistently better than ReLU networks across regimes and exhibit strong length extrapolation.

View on arXiv
Comments on this paper