ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2410.11227
23
0

Guarantees for Nonlinear Representation Learning: Non-identical Covariates, Dependent Data, Fewer Samples

15 October 2024
Thomas T. Zhang
Bruce D. Lee
Ingvar M. Ziemann
George J. Pappas
Nikolai Matni
    CML
    OOD
ArXivPDFHTML
Abstract

A driving force behind the diverse applicability of modern machine learning is the ability to extract meaningful features across many sources. However, many practical domains involve data that are non-identically distributed across sources, and statistically dependent within its source, violating vital assumptions in existing theoretical studies. Toward addressing these issues, we establish statistical guarantees for learning general nonlinear\textit{nonlinear}nonlinear representations from multiple data sources that admit different input distributions and possibly dependent data. Specifically, we study the sample-complexity of learning T+1T+1T+1 functions f⋆(t)∘g⋆f_\star^{(t)} \circ g_\starf⋆(t)​∘g⋆​ from a function class F×G\mathcal F \times \mathcal GF×G, where f⋆(t)f_\star^{(t)}f⋆(t)​ are task specific linear functions and g⋆g_\starg⋆​ is a shared nonlinear representation. A representation g^\hat gg^​ is estimated using NNN samples from each of TTT source tasks, and a fine-tuning function f^(0)\hat f^{(0)}f^​(0) is fit using N′N'N′ samples from a target task passed through g^\hat gg^​. We show that when N≳Cdep(dim(F)+C(G)/T)N \gtrsim C_{\mathrm{dep}} (\mathrm{dim}(\mathcal F) + \mathrm{C}(\mathcal G)/T)N≳Cdep​(dim(F)+C(G)/T), the excess risk of f^(0)∘g^\hat f^{(0)} \circ \hat gf^​(0)∘g^​ on the target task decays as νdiv(dim(F)N′+C(G)NT)\nu_{\mathrm{div}} \big(\frac{\mathrm{dim}(\mathcal F)}{N'} + \frac{\mathrm{C}(\mathcal G)}{N T} \big)νdiv​(N′dim(F)​+NTC(G)​), where CdepC_{\mathrm{dep}}Cdep​ denotes the effect of data dependency, νdiv\nu_{\mathrm{div}}νdiv​ denotes an (estimatable) measure of task-diversity\textit{task-diversity}task-diversity between the source and target tasks, and C(G)\mathrm C(\mathcal G)C(G) denotes the complexity of the representation class G\mathcal GG. In particular, our analysis reveals: as the number of tasks TTT increases, both the sample requirement and risk bound converge to that of rrr-dimensional regression as if g⋆g_\starg⋆​ had been given, and the effect of dependency only enters the sample requirement, leaving the risk bound matching the iid setting.

View on arXiv
Comments on this paper