84
v1v2 (latest)

Beyond Gaussian Initializations: Signal Preserving Weight Initialization for Odd-Sigmoid Activations

Main:10 Pages
40 Figures
Bibliography:3 Pages
7 Tables
Appendix:33 Pages
Abstract

Activation functions critically influence trainability and expressivity, and recent work has therefore explored a broad range of nonlinearities. However, widely used Gaussian i.i.d. initializations are designed to preserve activation variance under wide or infinite width assumptions. In deep and relatively narrow networks with sigmoidal nonlinearities, these schemes often drive preactivations into saturation, and collapse gradients. To address this, we introduce an odd-sigmoid activations and propose an activation aware initialization tailored to any function in this class. Our method remains robust over a wide band of variance scales, preserving both forward signal variance and backpropagated gradient norms even in very deep and narrow networks. Empirically, across standard image benchmarks we find that the proposed initialization is substantially less sensitive to depth, width, and activation scale than Gaussian initializations. In physics informed neural networks (PINNs), scaled odd-sigmoid activations combined with our initialization achieve lower losses than Gaussian based setups, suggesting that diagonal-plus-noise weights provide a practical alternative when Gaussian initialization breaks down.

View on arXiv
Comments on this paper