v1v2 (latest)

In-Context Learning with Transformers: Softmax Attention Adapts to Function Lipschitzness

18 February 2024

Sujay Sanghavi

ArXiv (abs)PDF HTML Github

Papers citing "In-Context Learning with Transformers: Softmax Attention Adapts to Function Lipschitzness"

16 / 16 papers shown

$Softmax $\geq$ Linear: Transformers may learn to classify in-context by kernel gradient descent$

Softmax

\geq

Linear: Transformers may learn to classify in-context by kernel gradient descent

188

12 Oct 2025

From Condensation to Rank Collapse: A Two-Stage Analysis of Transformer Training Dynamics

Zheng-an Chen

Tao Luo

AI4CE

175

08 Oct 2025

Review of Hallucination Understanding in Large Language and Vision Models

188

26 Sep 2025

Bounds of Chain-of-Thought Robustness: Reasoning Steps, Embed Norms, and Beyond

212

25 Sep 2025

On the Expressiveness of Softmax Attention: A Recurrent Neural Network Perspective

Gabriel Mongaras

Eric C. Larson

160

31 Jul 2025

MemShare: Memory Efficient Inference for Large Reasoning Models through KV Cache Reuse

321

29 Jul 2025

Provable In-Context Learning of Nonlinear Regression with Transformers

Hongbo Li

Lingjie Duan

Yingbin Liang

268

28 Jul 2025

When and How Unlabeled Data Provably Improve In-Context Learning

Amit K. Roy-Chowdhury

Samet Oymak

421

18 Jun 2025

Gating is Weighting: Understanding Gated Linear Attention through In-context Learning

Yingcong Li

Davoud Ataee Tarzanagh

A. S. Rawat

Maryam Fazel

Samet Oymak

288

06 Apr 2025

Context-Scaling versus Task-Scaling in In-Context Learning

Amirhesam Abedsoltan

Adityanarayanan Radhakrishnan

Jingfeng Wu

M. Belkin

ReLM LRM

223

16 Oct 2024

Can In-context Learning Really Generalize to Out-of-distribution Tasks?International Conference on Learning Representations (ICLR), 2024

Yisen Wang

327

13 Oct 2024

Fine-grained Analysis of In-context Linear Estimation: Data, Architecture, and Beyond

Yingcong Li

A. S. Rawat

Samet Oymak

284

13 Jul 2024

On the Power of Convolution Augmented Transformer

272

08 Jul 2024

How In-Context Learning Emerges from Training on Unstructured Data: On the Role of Co-Occurrence, Positional Information, and Noise Structures

Kevin Christian Wibisono

Yixin Wang

156

31 May 2024

Mechanics of Next Token Prediction with Self-AttentionInternational Conference on Artificial Intelligence and Statistics (AISTATS), 2024

249

12 Mar 2024

Transformers are Provably Optimal In-context Estimators for Wireless CommunicationsInternational Conference on Artificial Intelligence and Statistics (AISTATS), 2023

Vishnu Teja Kunde

Vicram Rajagopalan

Chandra Shekhara Kaushik Valmeekam

718

01 Nov 2023