Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
2212.07677
Cited By
v1
v2 (latest)
Transformers learn in-context by gradient descent
International Conference on Machine Learning (ICML), 2022
15 December 2022
J. Oswald
Eyvind Niklasson
E. Randazzo
João Sacramento
A. Mordvintsev
A. Zhmoginov
Max Vladymyrov
MLT
Re-assign community
ArXiv (abs)
PDF
HTML
HuggingFace (1 upvotes)
Github (361★)
Papers citing
"Transformers learn in-context by gradient descent"
50 / 456 papers shown
Title
Equivalence of Context and Parameter Updates in Modern Transformer Blocks
Adrian Goldwaser
Michael Munn
J. Gonzalvo
Benoit Dherin
60
0
0
24 Dec 2025
Learning without training: The implicit dynamics of in-context learning
Benoit Dherin
Michael Munn
Hanna Mazzawi
Michael Wunder
J. Gonzalvo
ReLM
OffRL
LRM
382
14
0
24 Dec 2025
The brain-AI convergence: Predictive and generative world models for general-purpose computation
Shogo Ohmae
Keiko Ohmae
16
0
0
02 Dec 2025
Semantic Anchors in In-Context Learning: Why Small LLMs Cannot Flip Their Labels
Anantha Padmanaban Krishna Kumar
120
0
0
26 Nov 2025
Transformers with RL or SFT Provably Learn Sparse Boolean Functions, But Differently
Bochen Lyu
Yiyang Jia
Xiaohao Cai
Zhanxing Zhu
MoE
112
0
0
22 Nov 2025
Implicit Federated In-context Learning For Task-Specific LLM Fine-Tuning
Dongcheng Li
Junhan Chen
Aoxiang Zhou
Chunpei Li
Youquan Xian
Peng Liu
Xianxian Li
FedML
298
0
0
10 Nov 2025
Robust Experimental Design via Generalised Bayesian Inference
Yasir Zubayr Barlas
Sabina J. Sloman
Samuel Kaski
100
0
0
10 Nov 2025
Scaling Laws and In-Context Learning: A Unified Theoretical Framework
Sushant Mehta
Ishan Gupta
81
0
0
09 Nov 2025
Optimal Attention Temperature Enhances In-Context Learning under Distribution Shift
Samet Demir
Zafer Dogan
104
0
0
03 Nov 2025
On the Emergence of Induction Heads for In-Context Learning
Tiberiu Musat
Tiago Pimentel
Lorenzo Noci
Alessandro Stolfo
Mrinmaya Sachan
Thomas Hofmann
100
0
0
02 Nov 2025
Belief Dynamics Reveal the Dual Nature of In-Context Learning and Activation Steering
Eric J. Bigelow
Daniel Wurgaft
YingQiao Wang
Noah D. Goodman
T. Ullman
Hidenori Tanaka
Ekdeep Singh Lubana
LLMSV
120
0
0
01 Nov 2025
Detecting Data Contamination in LLMs via In-Context Learning
Michał Zawalski
Meriem Boubdir
Klaudia Bałazy
Besmira Nushi
Pablo Ribalta
129
0
0
30 Oct 2025
How Data Mixing Shapes In-Context Learning: Asymptotic Equivalence for Transformers with MLPs
Samet Demir
Zafer Dogan
89
0
0
29 Oct 2025
Understanding Multi-View Transformers
Michal Stary
Julien Gaubil
A. Tewari
Vincent Sitzmann
ViT
76
0
0
28 Oct 2025
Provable test-time adaptivity and distributional robustness of in-context learning
Tianyi Ma
Tengyao Wang
R. Samworth
104
1
0
27 Oct 2025
Can Language Models Compose Skills In-Context?
Zidong Liu
Zhuoyan Xu
Zhenmei Shi
Yingyu Liang
ReLM
CoGe
LRM
227
0
0
27 Oct 2025
A Framework for Quantifying How Pre-Training and Context Benefit In-Context Learning
Bingqing Song
Jiaxiang Li
Rong Wang
Songtao Lu
Mingyi Hong
100
0
0
26 Oct 2025
Enabling Robust In-Context Memory and Rapid Task Adaptation in Transformers with Hebbian and Gradient-Based Plasticity
Siddharth Chaudhary
134
0
0
24 Oct 2025
Large Language Models as Model Organisms for Human Associative Learning
Camila Kolling
Vy A. Vo
Mariya Toneva
KELM
176
0
0
24 Oct 2025
Do Prompts Reshape Representations? An Empirical Study of Prompting Effects on Embeddings
Cesar Gonzalez-Gutierrez
Dirk Hovy
112
0
0
22 Oct 2025
Transformers are almost optimal metalearners for linear classification
Roey Magen
Gal Vardi
116
0
0
22 Oct 2025
How Do LLMs Use Their Depth?
Akshat Gupta
Jay Yeung
Gopala Anumanchipalli
Anna Ivanova
64
0
0
21 Oct 2025
Optimality and NP-Hardness of Transformers in Learning Markovian Dynamical Functions
Yanna Ding
Songtao Lu
Yingdong Lu
T. Nowicki
Jianxi Gao
184
0
0
21 Oct 2025
Layer Specialization Underlying Compositional Reasoning in Transformers
Jing Liu
LRM
113
0
0
20 Oct 2025
Improving Model Representation and Reducing KV Cache via Skip Connections with First Value Heads
Zhoutong Wu
Y. Zhang
Yiming Dong
Chenheng Zhang
Cong Fang
Kun Yuan
Zhouchen Lin
127
0
0
19 Oct 2025
LLM-ERM: Sample-Efficient Program Learning via LLM-Guided Search
Shivam Singhal
Eran Malach
T. Poggio
Tomer Galanti
72
0
0
16 Oct 2025
Rewiring Experts on the Fly:Continuous Rerouting for Better Online Adaptation in Mixture-of-Expert models
Guinan Su
Yanwu Yang
Li Shen
Lu Yin
Shiwei Liu
Jonas Geiping
MoE
KELM
156
1
0
16 Oct 2025
Mamba Can Learn Low-Dimensional Targets In-Context via Test-Time Feature Learning
Junsoo Oh
Wei Huang
Taiji Suzuki
188
0
0
14 Oct 2025
In-Context Learning Is Provably Bayesian Inference: A Generalization Theory for Meta-Learning
Tomoya Wakayama
Taiji Suzuki
UQCV
BDL
267
2
0
13 Oct 2025
Dimension-Free Minimax Rates for Learning Pairwise Interactions in Attention-Style Models
Shai Zucker
Xiong Wang
Fei Lu
Inbar Seroussi
104
0
0
13 Oct 2025
Softmax
≥
\geq
≥
Linear: Transformers may learn to classify in-context by kernel gradient descent
Sara Dragutinovic
Andrew Saxe
Aaditya K. Singh
MLT
124
1
0
12 Oct 2025
Design Principles for Sequence Models via Coefficient Dynamics
Jerome Sieber
Antonio Orvieto
Melanie Zeilinger
Carmen Amo Alonso
68
0
0
10 Oct 2025
Hyperspectral data augmentation with transformer-based diffusion models
Mattia Ferrari
Lorenzo Bruzzone
108
0
0
09 Oct 2025
Fine-Grained Emotion Recognition via In-Context Learning
Zhaochun Ren
Zhou Yang
Chenglong Ye
Haizhou Sun
Chao Chen
Xiaofei Zhu
Xiangwen Liao
92
0
0
08 Oct 2025
Learning Linear Regression with Low-Rank Tasks in-Context
Kaito Takanami
Takashi Takahashi
Y. Kabashima
95
0
0
06 Oct 2025
ContextNav: Towards Agentic Multimodal In-Context Learning
Honghao Fu
Yuan Ouyang
Kai-Wei Chang
Yiwei Wang
Zi Huang
Yujun Cai
160
0
0
06 Oct 2025
Multi-Agent Collaborative Intelligence: Dual-Dial Control for Reliable LLM Reasoning
Edward Y. Chang
Ethan Chang
76
2
0
06 Oct 2025
Allocation of Parameters in Transformers
Ruoxi Yu
Haotian Jiang
Jingpu Cheng
Penghao Yu
Qianxiao Li
Zhong Li
MoE
130
0
0
04 Oct 2025
Take Goodhart Seriously: Principled Limit on General-Purpose AI Optimization
Antoine Maier
Aude Maier
Tom David
88
0
0
03 Oct 2025
Multi-Agent Design Assistant for the Simulation of Inertial Fusion Energy
Meir H. Shachar
D. Sterbentz
Harshitha Menon
C. Jekel
M. Giselle Fernández-Godino
...
Kevin Korner
Robert Rieben
D. White
William J. Schill
Jonathan L. Belof
AI4CE
155
0
0
02 Oct 2025
Pool Me Wisely: On the Effect of Pooling in Transformer-Based Models
Sofiane Ennadir
Levente Zólyomi
Oleg Smirnov
Tianze Wang
John Pertoft
Filip Cornell
Lele Cao
100
0
0
02 Oct 2025
Compositional meta-learning through probabilistic task inference
Jacob J. W. Bakermans
Pablo Tano
Reidar Riveland
Charles Findling
Alexandre Pouget
CLL
110
0
0
02 Oct 2025
Can Mamba Learn In Context with Outliers? A Theoretical Generalization Analysis
Hongkang Li
Songtao Lu
Xiaodong Cui
Pin-Yu Chen
Meng Wang
MLT
108
1
0
01 Oct 2025
Theory of Scaling Laws for In-Context Regression: Depth, Width, Context and Time
Blake Bordelon
Mary I. Letey
Cengiz Pehlevan
145
0
0
01 Oct 2025
Local Linear Attention: An Optimal Interpolation of Linear and Softmax Attention For Test-Time Regression
Yifei Zuo
Yutong Yin
Zhichen Zeng
Ang Li
Banghua Zhu
Zhaoran Wang
128
0
0
01 Oct 2025
Pretrain-Test Task Alignment Governs Generalization in In-Context Learning
Mary I. Letey
Jacob A. Zavatone-Veth
Yue M. Lu
Cengiz Pehlevan
117
1
0
30 Sep 2025
Test time training enhances in-context learning of nonlinear functions
Kento Kuwataka
Taiji Suzuki
132
1
0
30 Sep 2025
TTT3R: 3D Reconstruction as Test-Time Training
Xingyu Chen
Yue Chen
Yuliang Xiu
Andreas Geiger
Anpei Chen
3DV
209
12
0
30 Sep 2025
In-Context Compositional Q-Learning for Offline Reinforcement Learning
Qiushui Xu
Yuhao Huang
Yushu Jiang
Lei Song
Jinyu Wang
Wenliang Zheng
Jiang Bian
OffRL
100
0
0
28 Sep 2025
From Harm to Help: Turning Reasoning In-Context Demos into Assets for Reasoning LMs
Haonan Wang
Weida Liang
Zihang Fu
Nie Zheng
Y. Zhang
...
Tongyao Zhu
Hao Jiang
Chuang Li
Jiaying Wu
Kenji Kawaguchi
ReLM
LRM
128
0
0
27 Sep 2025
1
2
3
4
...
8
9
10
Next