ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2402.11639
  4. Cited By
In-Context Learning with Transformers: Softmax Attention Adapts to
  Function Lipschitzness
v1v2 (latest)

In-Context Learning with Transformers: Softmax Attention Adapts to Function Lipschitzness

18 February 2024
Liam Collins
Advait Parulekar
Aryan Mokhtari
Sujay Sanghavi
Sanjay Shakkottai
    MLT
ArXiv (abs)PDFHTMLGithub

Papers citing "In-Context Learning with Transformers: Softmax Attention Adapts to Function Lipschitzness"

16 / 16 papers shown
Softmax $\geq$ Linear: Transformers may learn to classify in-context by kernel gradient descent
Softmax ≥\geq≥ Linear: Transformers may learn to classify in-context by kernel gradient descent
Sara Dragutinovic
Andrew Saxe
Aaditya K. Singh
MLT
188
2
0
12 Oct 2025
From Condensation to Rank Collapse: A Two-Stage Analysis of Transformer Training Dynamics
From Condensation to Rank Collapse: A Two-Stage Analysis of Transformer Training Dynamics
Zheng-an Chen
Tao Luo
AI4CE
175
2
0
08 Oct 2025
Review of Hallucination Understanding in Large Language and Vision Models
Review of Hallucination Understanding in Large Language and Vision Models
Zhengyi Ho
Siyuan Liang
D. Tao
VLMLRM
188
2
0
26 Sep 2025
Bounds of Chain-of-Thought Robustness: Reasoning Steps, Embed Norms, and Beyond
Bounds of Chain-of-Thought Robustness: Reasoning Steps, Embed Norms, and Beyond
Dingzirui Wang
Xuanliang Zhang
Keyan Xu
Qingfu Zhu
Wanxiang Che
Yang Deng
LRM
212
1
0
25 Sep 2025
On the Expressiveness of Softmax Attention: A Recurrent Neural Network Perspective
On the Expressiveness of Softmax Attention: A Recurrent Neural Network Perspective
Gabriel Mongaras
Eric C. Larson
160
4
0
31 Jul 2025
MemShare: Memory Efficient Inference for Large Reasoning Models through KV Cache Reuse
MemShare: Memory Efficient Inference for Large Reasoning Models through KV Cache Reuse
Kaiwen Chen
Xin Tan
Minchen Yu
Hong Xu
LRMVLM
321
1
0
29 Jul 2025
Provable In-Context Learning of Nonlinear Regression with Transformers
Provable In-Context Learning of Nonlinear Regression with Transformers
Hongbo Li
Lingjie Duan
Yingbin Liang
268
4
0
28 Jul 2025
When and How Unlabeled Data Provably Improve In-Context Learning
When and How Unlabeled Data Provably Improve In-Context Learning
Yingcong Li
Xiangyu Chang
Muti Kara
Xiaofeng Liu
Amit K. Roy-Chowdhury
Samet Oymak
421
4
0
18 Jun 2025
Gating is Weighting: Understanding Gated Linear Attention through In-context Learning
Gating is Weighting: Understanding Gated Linear Attention through In-context Learning
Yingcong Li
Davoud Ataee Tarzanagh
A. S. Rawat
Maryam Fazel
Samet Oymak
288
7
0
06 Apr 2025
Context-Scaling versus Task-Scaling in In-Context Learning
Context-Scaling versus Task-Scaling in In-Context Learning
Amirhesam Abedsoltan
Adityanarayanan Radhakrishnan
Jingfeng Wu
M. Belkin
ReLMLRM
223
5
0
16 Oct 2024
Can In-context Learning Really Generalize to Out-of-distribution Tasks?
Can In-context Learning Really Generalize to Out-of-distribution Tasks?International Conference on Learning Representations (ICLR), 2024
Qixun Wang
Yifei Wang
Yisen Wang
Xianghua Ying
OOD
327
19
0
13 Oct 2024
Fine-grained Analysis of In-context Linear Estimation: Data,
  Architecture, and Beyond
Fine-grained Analysis of In-context Linear Estimation: Data, Architecture, and Beyond
Yingcong Li
A. S. Rawat
Samet Oymak
284
19
0
13 Jul 2024
On the Power of Convolution Augmented Transformer
On the Power of Convolution Augmented Transformer
Mingchen Li
Xuechen Zhang
Yixiao Huang
Samet Oymak
272
7
0
08 Jul 2024
How In-Context Learning Emerges from Training on Unstructured Data: On
  the Role of Co-Occurrence, Positional Information, and Noise Structures
How In-Context Learning Emerges from Training on Unstructured Data: On the Role of Co-Occurrence, Positional Information, and Noise Structures
Kevin Christian Wibisono
Yixin Wang
156
0
0
31 May 2024
Mechanics of Next Token Prediction with Self-Attention
Mechanics of Next Token Prediction with Self-AttentionInternational Conference on Artificial Intelligence and Statistics (AISTATS), 2024
Yingcong Li
Yixiao Huang
M. E. Ildiz
A. S. Rawat
Samet Oymak
249
44
0
12 Mar 2024
Transformers are Provably Optimal In-context Estimators for Wireless Communications
Transformers are Provably Optimal In-context Estimators for Wireless CommunicationsInternational Conference on Artificial Intelligence and Statistics (AISTATS), 2023
Vishnu Teja Kunde
Vicram Rajagopalan
Chandra Shekhara Kaushik Valmeekam
Krishna R. Narayanan
S. Shakkottai
D. Kalathil
J. Chamberland
718
13
0
01 Nov 2023
1
Page 1 of 1