ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2109.12960
25
9

Ridgeless Interpolation with Shallow ReLU Networks in 1D1D1D is Nearest Neighbor Curvature Extrapolation and Provably Generalizes on Lipschitz Functions

27 September 2021
Boris Hanin
    MLT
ArXivPDFHTML
Abstract

We prove a precise geometric description of all one layer ReLU networks z(x;θ)z(x;\theta)z(x;θ) with a single linear unit and input/output dimensions equal to one that interpolate a given dataset D={(xi,f(xi))}\mathcal D=\{(x_i,f(x_i))\}D={(xi​,f(xi​))} and, among all such interpolants, minimize the ℓ2\ell_2ℓ2​-norm of the neuron weights. Such networks can intuitively be thought of as those that minimize the mean-squared error over D\mathcal DD plus an infinitesimal weight decay penalty. We therefore refer to them as ridgeless ReLU interpolants. Our description proves that, to extrapolate values z(x;θ)z(x;\theta)z(x;θ) for inputs x∈(xi,xi+1)x\in (x_i,x_{i+1})x∈(xi​,xi+1​) lying between two consecutive datapoints, a ridgeless ReLU interpolant simply compares the signs of the discrete estimates for the curvature of fff at xix_ixi​ and xi+1x_{i+1}xi+1​ derived from the dataset D\mathcal DD. If the curvature estimates at xix_ixi​ and xi+1x_{i+1}xi+1​ have different signs, then z(x;θ)z(x;\theta)z(x;θ) must be linear on (xi,xi+1)(x_i,x_{i+1})(xi​,xi+1​). If in contrast the curvature estimates at xix_ixi​ and xi+1x_{i+1}xi+1​ are both positive (resp. negative), then z(x;θ)z(x;\theta)z(x;θ) is convex (resp. concave) on (xi,xi+1)(x_i,x_{i+1})(xi​,xi+1​). Our results show that ridgeless ReLU interpolants achieve the best possible generalization for learning 1d1d1d Lipschitz functions, up to universal constants.

View on arXiv
Comments on this paper