ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1904.06984
19
36

Depth Separations in Neural Networks: What is Actually Being Separated?

15 April 2019
Itay Safran
Ronen Eldan
Ohad Shamir
    MDE
ArXivPDFHTML
Abstract

Existing depth separation results for constant-depth networks essentially show that certain radial functions in Rd\mathbb{R}^dRd, which can be easily approximated with depth 333 networks, cannot be approximated by depth 222 networks, even up to constant accuracy, unless their size is exponential in ddd. However, the functions used to demonstrate this are rapidly oscillating, with a Lipschitz parameter scaling polynomially with the dimension ddd (or equivalently, by scaling the function, the hardness result applies to O(1)\mathcal{O}(1)O(1)-Lipschitz functions only when the target accuracy ϵ\epsilonϵ is at most poly(1/d)\text{poly}(1/d)poly(1/d)). In this paper, we study whether such depth separations might still hold in the natural setting of O(1)\mathcal{O}(1)O(1)-Lipschitz radial functions, when ϵ\epsilonϵ does not scale with ddd. Perhaps surprisingly, we show that the answer is negative: In contrast to the intuition suggested by previous work, it \emph{is} possible to approximate O(1)\mathcal{O}(1)O(1)-Lipschitz radial functions with depth 222, size poly(d)\text{poly}(d)poly(d) networks, for every constant ϵ\epsilonϵ. We complement it by showing that approximating such functions is also possible with depth 222, size poly(1/ϵ)\text{poly}(1/\epsilon)poly(1/ϵ) networks, for every constant ddd. Finally, we show that it is not possible to have polynomial dependence in both d,1/ϵd,1/\epsilond,1/ϵ simultaneously. Overall, our results indicate that in order to show depth separations for expressing O(1)\mathcal{O}(1)O(1)-Lipschitz functions with constant accuracy -- if at all possible -- one would need fundamentally different techniques than existing ones in the literature.

View on arXiv
Comments on this paper