ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1708.03735
256
21
v1v2 (latest)

Sparse Coding and Autoencoders

International Symposium on Information Theory (ISIT), 2017
12 August 2017
Akshay Rangamani
Anirbit Mukherjee
A. Basu
T. Ganapathi
Ashish Arora
S. Chin
T. Tran
ArXiv (abs)PDFHTML
Abstract

In "Dictionary Learning" one tries to recover incoherent matrices A∗∈Rn×hA^* \in \mathbb{R}^{n \times h}A∗∈Rn×h (typically overcomplete and whose columns are assumed to be normalized) and sparse vectors x∗∈Rhx^* \in \mathbb{R}^hx∗∈Rh with a small support of size hph^php for some 0<p<10 <p < 10<p<1 while having access to observations y∈Rny \in \mathbb{R}^ny∈Rn where y=A∗x∗y = A^*x^*y=A∗x∗. In this work we undertake a rigorous analysis of whether gradient descent on the squared loss of an autoencoder can solve the dictionary learning problem. The "Autoencoder" architecture we consider is a Rn→Rn\mathbb{R}^n \rightarrow \mathbb{R}^nRn→Rn mapping with a single ReLU activation layer of size hhh. Under very mild distributional assumptions on x∗x^*x∗, we prove that the norm of the expected gradient of the standard squared loss function is asymptotically (in sparse code dimension) negligible for all points in a small neighborhood of A∗A^*A∗. This is supported with experimental evidence using synthetic data. We also conduct experiments to suggest that A∗A^*A∗ is a local minimum. Along the way we prove that a layer of ReLU gates can be set up to automatically recover the support of the sparse codes. This property holds independent of the loss function. We believe that it could be of independent interest.

View on arXiv
Comments on this paper