14

Deep Delta Learning

Yifan Zhang
Yifeng Liu
Mengdi Wang
Quanquan Gu
Main:12 Pages
1 Figures
Bibliography:5 Pages
1 Tables
Appendix:5 Pages
Abstract

The efficacy of deep residual networks is fundamentally predicated on the identity shortcut connection. While this mechanism effectively mitigates the vanishing gradient problem, it imposes a strictly additive inductive bias on feature transformations, thereby limiting the network's capacity to model complex state transitions. In this paper, we introduce Deep Delta Learning (DDL), a novel architecture that generalizes the standard residual connection by modulating the identity shortcut with a learnable, data-dependent geometric transformation. This transformation, termed the Delta Operator, constitutes a rank-1 perturbation of the identity matrix, parameterized by a reflection direction vector k(X)\mathbf{k}(\mathbf{X}) and a gating scalar β(X)\beta(\mathbf{X}). We provide a spectral analysis of this operator, demonstrating that the gate β(X)\beta(\mathbf{X}) enables dynamic interpolation between identity mapping, orthogonal projection, and geometric reflection. Furthermore, we restructure the residual update as a synchronous rank-1 injection, where the gate acts as a dynamic step size governing both the erasure of old information and the writing of new features. This unification empowers the network to explicitly control the spectrum of its layer-wise transition operator, enabling the modeling of complex, non-monotonic dynamics while preserving the stable training characteristics of gated residual architectures.

View on arXiv
Comments on this paper