v1v2 (latest)

Differentiable Sparsity via $D$ -Gating: Simple and Versatile Structured Penalization

28 September 2025

Main:9 Pages

25 Figures

Bibliography:5 Pages

5 Tables

Appendix:23 Pages

Abstract

Structured sparsity regularization offers a principled way to compact neural networks, but its non-differentiability breaks compatibility with conventional stochastic gradient descent and requires either specialized optimizers or additional post-hoc pruning without formal guarantees. In this work, we propose $D$ -Gating, a fully differentiable structured overparameterization that splits each group of weights into a primary weight vector and multiple scalar gating factors. We prove that any local minimum under $D$ -Gating is also a local minimum using non-smooth structured $L_{2,2/D}$ penalization, and further show that the $D$ -Gating objective converges at least exponentially fast to the $L_{2,2/D}$ -regularized loss in the gradient flow limit. Together, our results show that $D$ -Gating is theoretically equivalent to solving the original group sparsity problem, yet induces distinct learning dynamics that evolve from a non-sparse regime into sparse optimization. We validate our theory across vision, language, and tabular tasks, where $D$ -Gating consistently delivers strong performance-sparsity tradeoffs and outperforms both direct optimization of structured penalties and conventional pruning baselines.

View on arXiv

Comments on this paper

Differentiable Sparsity via DDD-Gating: Simple and Versatile Structured Penalization

Differentiable Sparsity via $D$ -Gating: Simple and Versatile Structured Penalization