466

Guarantees for Nonlinear Representation Learning: Non-identical Covariates, Dependent Data, Fewer Samples

International Conference on Machine Learning (ICML), 2024
Main:16 Pages
3 Figures
Bibliography:5 Pages
Appendix:17 Pages
Abstract

A driving force behind the diverse applicability of modern machine learning is the ability to extract meaningful features across many sources. However, many practical domains involve data that are non-identically distributed across sources, and statistically dependent within its source, violating vital assumptions in existing theoretical studies. Toward addressing these issues, we establish statistical guarantees for learning general nonlinear\textit{nonlinear} representations from multiple data sources that admit different input distributions and possibly dependent data. Specifically, we study the sample-complexity of learning T+1T+1 functions f(t)gf_\star^{(t)} \circ g_\star from a function class F×G\mathcal F \times \mathcal G, where f(t)f_\star^{(t)} are task specific linear functions and gg_\star is a shared nonlinear representation. A representation g^\hat g is estimated using NN samples from each of TT source tasks, and a fine-tuning function f^(0)\hat f^{(0)} is fit using NN' samples from a target task passed through g^\hat g. We show that when NCdep(dim(F)+C(G)/T)N \gtrsim C_{\mathrm{dep}} (\mathrm{dim}(\mathcal F) + \mathrm{C}(\mathcal G)/T), the excess risk of f^(0)g^\hat f^{(0)} \circ \hat g on the target task decays as νdiv(dim(F)N+C(G)NT)\nu_{\mathrm{div}} \big(\frac{\mathrm{dim}(\mathcal F)}{N'} + \frac{\mathrm{C}(\mathcal G)}{N T} \big), where CdepC_{\mathrm{dep}} denotes the effect of data dependency, νdiv\nu_{\mathrm{div}} denotes an (estimatable) measure of task-diversity\textit{task-diversity} between the source and target tasks, and C(G)\mathrm C(\mathcal G) denotes the complexity of the representation class G\mathcal G. In particular, our analysis reveals: as the number of tasks TT increases, both the sample requirement and risk bound converge to that of rr-dimensional regression as if gg_\star had been given, and the effect of dependency only enters the sample requirement, leaving the risk bound matching the iid setting.

View on arXiv
Comments on this paper