ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2410.14183
21
2

In-context Learning for Mixture of Linear Regressions: Existence, Generalization and Training Dynamics

18 October 2024
Yanhao Jin
Krishnakumar Balasubramanian
Lifeng Lai
ArXivPDFHTML
Abstract

We investigate the in-context learning capabilities of transformers for the ddd-dimensional mixture of linear regression model, providing theoretical insights into their existence, generalization bounds, and training dynamics. Specifically, we prove that there exists a transformer capable of achieving a prediction error of order O(d/n)\mathcal{O}(\sqrt{d/n})O(d/n​) with high probability, where nnn represents the training prompt size in the high signal-to-noise ratio (SNR) regime. Moreover, we derive in-context excess risk bounds of order O(L/B)\mathcal{O}(L/\sqrt{B})O(L/B​) for the case of two mixtures, where BBB denotes the number of training prompts, and LLL represents the number of attention layers. The dependence of LLL on the SNR is explicitly characterized, differing between low and high SNR settings. We further analyze the training dynamics of transformers with single linear self-attention layers, demonstrating that, with appropriately initialized parameters, gradient flow optimization over the population mean square loss converges to a global optimum. Extensive simulations suggest that transformers perform well on this task, potentially outperforming other baselines, such as the Expectation-Maximization algorithm.

View on arXiv
@article{jin2025_2410.14183,
  title={ In-context Learning for Mixture of Linear Regressions: Existence, Generalization and Training Dynamics },
  author={ Yanhao Jin and Krishnakumar Balasubramanian and Lifeng Lai },
  journal={arXiv preprint arXiv:2410.14183},
  year={ 2025 }
}
Comments on this paper