ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1902.10170
18
92

Nonlinear Approximation via Compositions

26 February 2019
Zuowei Shen
Haizhao Yang
Shijun Zhang
ArXivPDFHTML
Abstract

Given a function dictionary D\cal DD and an approximation budget N∈N+N\in\mathbb{N}^+N∈N+, nonlinear approximation seeks the linear combination of the best NNN terms {Tn}1≤n≤N⊆D\{T_n\}_{1\le n\le N}\subseteq{\cal D}{Tn​}1≤n≤N​⊆D to approximate a given function fff with the minimum approximation error\[\varepsilon_{L,f}:=\min_{\{g_n\}\subseteq{\mathbb{R}},\{T_n\}\subseteq{\cal D}}\|f(x)-\sum_{n=1}^N g_n T_n(x)\|.\]Motivated by recent success of deep learning, we propose dictionaries with functions in a form of compositions, i.e.,\[T(x)=T^{(L)}\circ T^{(L-1)}\circ\cdots\circ T^{(1)}(x)\]for all T∈DT\in\cal DT∈D, and implement TTT using ReLU feed-forward neural networks (FNNs) with LLL hidden layers. We further quantify the improvement of the best NNN-term approximation rate in terms of NNN when LLL is increased from 111 to 222 or 333 to show the power of compositions. In the case when L>3L>3L>3, our analysis shows that increasing LLL cannot improve the approximation rate in terms of NNN. In particular, for any function fff on [0,1][0,1][0,1], regardless of its smoothness and even the continuity, if fff can be approximated using a dictionary when L=1L=1L=1 with the best NNN-term approximation rate εL,f=O(N−η)\varepsilon_{L,f}={\cal O}(N^{-\eta})εL,f​=O(N−η), we show that dictionaries with L=2L=2L=2 can improve the best NNN-term approximation rate to εL,f=O(N−2η)\varepsilon_{L,f}={\cal O}(N^{-2\eta})εL,f​=O(N−2η). We also show that for H\"older continuous functions of order α\alphaα on [0,1]d[0,1]^d[0,1]d, the application of a dictionary with L=3L=3L=3 in nonlinear approximation can achieve an essentially tight best NNN-term approximation rate εL,f=O(N−2α/d)\varepsilon_{L,f}={\cal O}(N^{-2\alpha/d})εL,f​=O(N−2α/d). Finally, we show that dictionaries consisting of wide FNNs with a few hidden layers are more attractive in terms of computational efficiency than dictionaries with narrow and very deep FNNs for approximating H\"older continuous functions if the number of computer cores is larger than NNN in parallel computing.

View on arXiv
Comments on this paper