ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2505.09167
24
3

Online Learning of Neural Networks

14 May 2025
Amit Daniely
Idan Mehalel
Elchanan Mossel
    MLT
ArXivPDFHTML
Abstract

We study online learning of feedforward neural networks with the sign activation function that implement functions from the unit ball in Rd\mathbb{R}^dRd to a finite label set {1,…,Y}\{1, \ldots, Y\}{1,…,Y}.First, we characterize a margin condition that is sufficient and in some cases necessary for online learnability of a neural network: Every neuron in the first hidden layer classifies all instances with some margin γ\gammaγ bounded away from zero. Quantitatively, we prove that for any net, the optimal mistake bound is at most approximately TS(d,γ)\mathtt{TS}(d,\gamma)TS(d,γ), which is the (d,γ)(d,\gamma)(d,γ)-totally-separable-packing number, a more restricted variation of the standard (d,γ)(d,\gamma)(d,γ)-packing number. We complement this result by constructing a net on which any learner makes TS(d,γ)\mathtt{TS}(d,\gamma)TS(d,γ) many mistakes. We also give a quantitative lower bound of approximately TS(d,γ)≥max⁡{1/(γd)d,d}\mathtt{TS}(d,\gamma) \geq \max\{1/(\gamma \sqrt{d})^d, d\}TS(d,γ)≥max{1/(γd​)d,d} when γ≥1/2\gamma \geq 1/2γ≥1/2, implying that for some nets and input sequences every learner will err for exp⁡(d)\exp(d)exp(d) many times, and that a dimension-free mistake bound is almost always impossible.To remedy this inevitable dependence on ddd, it is natural to seek additional natural restrictions to be placed on the network, so that the dependence on ddd is removed. We study two such restrictions. The first is the multi-index model, in which the function computed by the net depends only on k≪dk \ll dk≪d orthonormal directions. We prove a mistake bound of approximately (1.5/γ)k+2(1.5/\gamma)^{k + 2}(1.5/γ)k+2 in this model. The second is the extended margin assumption. In this setting, we assume that all neurons (in all layers) in the network classify every ingoing input from previous layer with margin γ\gammaγ bounded away from zero. In this model, we prove a mistake bound of approximately (log⁡Y)/γO(L)(\log Y)/ \gamma^{O(L)}(logY)/γO(L), where L is the depth of the network.

View on arXiv
@article{daniely2025_2505.09167,
  title={ Online Learning of Neural Networks },
  author={ Amit Daniely and Idan Mehalel and Elchanan Mossel },
  journal={arXiv preprint arXiv:2505.09167},
  year={ 2025 }
}
Comments on this paper