ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2502.02496
88
0

Deep Weight Factorization: Sparse Learning Through the Lens of Artificial Symmetries

4 February 2025
Chris Kolb
T. Weber
Bernd Bischl
David Rügamer
ArXivPDFHTML
Abstract

Sparse regularization techniques are well-established in machine learning, yet their application in neural networks remains challenging due to the non-differentiability of penalties like the L1L_1L1​ norm, which is incompatible with stochastic gradient descent. A promising alternative is shallow weight factorization, where weights are decomposed into two factors, allowing for smooth optimization of L1L_1L1​-penalized neural networks by adding differentiable L2L_2L2​ regularization to the factors. In this work, we introduce deep weight factorization, extending previous shallow approaches to more than two factors. We theoretically establish equivalence of our deep factorization with non-convex sparse regularization and analyze its impact on training dynamics and optimization. Due to the limitations posed by standard training practices, we propose a tailored initialization scheme and identify important learning rate requirements necessary for training factorized networks. We demonstrate the effectiveness of our deep weight factorization through experiments on various architectures and datasets, consistently outperforming its shallow counterpart and widely used pruning methods.

View on arXiv
@article{kolb2025_2502.02496,
  title={ Deep Weight Factorization: Sparse Learning Through the Lens of Artificial Symmetries },
  author={ Chris Kolb and Tobias Weber and Bernd Bischl and David Rügamer },
  journal={arXiv preprint arXiv:2502.02496},
  year={ 2025 }
}
Comments on this paper