ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2008.01011
52
7

Phase Transitions in Rate Distortion Theory and Deep Learning

3 August 2020
Philipp Grohs
Andreas Klotz
F. Voigtlaender
ArXiv (abs)PDFHTML
Abstract

Rate distortion theory is concerned with optimally encoding a given signal class S\mathcal{S}S using a budget of RRR bits, as R→∞R\to\inftyR→∞. We say that S\mathcal{S}S can be compressed at rate sss if we can achieve an error of O(R−s)\mathcal{O}(R^{-s})O(R−s) for encoding S\mathcal{S}S; the supremal compression rate is denoted s∗(S)s^\ast(\mathcal{S})s∗(S). Given a fixed coding scheme, there usually are elements of S\mathcal{S}S that are compressed at a higher rate than s∗(S)s^\ast(\mathcal{S})s∗(S) by the given coding scheme; we study the size of this set of signals. We show that for certain "nice" signal classes S\mathcal{S}S, a phase transition occurs: We construct a probability measure P\mathbb{P}P on S\mathcal{S}S such that for every coding scheme C\mathcal{C}C and any s>s∗(S)s >s^\ast(\mathcal{S})s>s∗(S), the set of signals encoded with error O(R−s)\mathcal{O}(R^{-s})O(R−s) by C\mathcal{C}C forms a P\mathbb{P}P-null-set. In particular our results apply to balls in Besov and Sobolev spaces that embed compactly into L2(Ω)L^2(\Omega)L2(Ω) for a bounded Lipschitz domain Ω\OmegaΩ. As an application, we show that several existing sharpness results concerning function approximation using deep neural networks are generically sharp. We also provide quantitative and non-asymptotic bounds on the probability that a random f∈Sf\in\mathcal{S}f∈S can be encoded to within accuracy ε\varepsilonε using RRR bits. This result is applied to the problem of approximately representing f∈Sf\in\mathcal{S}f∈S to within accuracy ε\varepsilonε by a (quantized) neural network that is constrained to have at most WWW nonzero weights and is generated by an arbitrary "learning" procedure. We show that for any s>s∗(S)s >s^\ast(\mathcal{S})s>s∗(S) there are constants c,Cc,Cc,C such that, no matter how we choose the "learning" procedure, the probability of success is bounded from above by min⁡{1,2C⋅W⌈log⁡2(1+W)⌉2−c⋅ε−1/s}\min\big\{1,2^{C\cdot W\lceil\log_2(1+W)\rceil^2 -c\cdot\varepsilon^{-1/s}}\big\}min{1,2C⋅W⌈log2​(1+W)⌉2−c⋅ε−1/s}.

View on arXiv
Comments on this paper