Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
2212.07677
Cited By
v1
v2 (latest)
Transformers learn in-context by gradient descent
International Conference on Machine Learning (ICML), 2022
15 December 2022
J. Oswald
Eyvind Niklasson
E. Randazzo
João Sacramento
A. Mordvintsev
A. Zhmoginov
Max Vladymyrov
MLT
Re-assign community
ArXiv (abs)
PDF
HTML
HuggingFace (1 upvotes)
Github (361★)
Papers citing
"Transformers learn in-context by gradient descent"
50 / 453 papers shown
Title
Semantic Anchors in In-Context Learning: Why Small LLMs Cannot Flip Their Labels
Anantha Padmanaban Krishna Kumar
104
0
0
26 Nov 2025
Transformers with RL or SFT Provably Learn Sparse Boolean Functions, But Differently
Bochen Lyu
Yiyang Jia
Xiaohao Cai
Zhanxing Zhu
MoE
72
0
0
22 Nov 2025
Implicit Federated In-context Learning For Task-Specific LLM Fine-Tuning
Dongcheng Li
Junhan Chen
Aoxiang Zhou
Chunpei Li
Youquan Xian
Peng Liu
Xianxian Li
FedML
218
0
0
10 Nov 2025
Robust Experimental Design via Generalised Bayesian Inference
Yasir Zubayr Barlas
Sabina J. Sloman
Samuel Kaski
72
0
0
10 Nov 2025
Scaling Laws and In-Context Learning: A Unified Theoretical Framework
Sushant Mehta
Ishan Gupta
69
0
0
09 Nov 2025
Optimal Attention Temperature Enhances In-Context Learning under Distribution Shift
Samet Demir
Zafer Dogan
88
0
0
03 Nov 2025
On the Emergence of Induction Heads for In-Context Learning
Tiberiu Musat
Tiago Pimentel
Lorenzo Noci
Alessandro Stolfo
Mrinmaya Sachan
Thomas Hofmann
88
0
0
02 Nov 2025
Belief Dynamics Reveal the Dual Nature of In-Context Learning and Activation Steering
Eric J. Bigelow
Daniel Wurgaft
YingQiao Wang
Noah D. Goodman
T. Ullman
Hidenori Tanaka
Ekdeep Singh Lubana
LLMSV
100
0
0
01 Nov 2025
Detecting Data Contamination in LLMs via In-Context Learning
Michał Zawalski
Meriem Boubdir
Klaudia Bałazy
Besmira Nushi
Pablo Ribalta
113
0
0
30 Oct 2025
How Data Mixing Shapes In-Context Learning: Asymptotic Equivalence for Transformers with MLPs
Samet Demir
Zafer Dogan
73
0
0
29 Oct 2025
Understanding Multi-View Transformers
Michal Stary
Julien Gaubil
A. Tewari
Vincent Sitzmann
ViT
60
0
0
28 Oct 2025
Provable test-time adaptivity and distributional robustness of in-context learning
Tianyi Ma
Tengyao Wang
R. Samworth
84
1
0
27 Oct 2025
Can Language Models Compose Skills In-Context?
Zidong Liu
Zhuoyan Xu
Zhenmei Shi
Yingyu Liang
ReLM
CoGe
LRM
199
0
0
27 Oct 2025
A Framework for Quantifying How Pre-Training and Context Benefit In-Context Learning
Bingqing Song
Jiaxiang Li
Rong Wang
Songtao Lu
Mingyi Hong
60
0
0
26 Oct 2025
Enabling Robust In-Context Memory and Rapid Task Adaptation in Transformers with Hebbian and Gradient-Based Plasticity
Siddharth Chaudhary
118
0
0
24 Oct 2025
Large Language Models as Model Organisms for Human Associative Learning
Camila Kolling
Vy A. Vo
Mariya Toneva
KELM
160
0
0
24 Oct 2025
Transformers are almost optimal metalearners for linear classification
Roey Magen
Gal Vardi
104
0
0
22 Oct 2025
Do Prompts Reshape Representations? An Empirical Study of Prompting Effects on Embeddings
Cesar Gonzalez-Gutierrez
Dirk Hovy
100
0
0
22 Oct 2025
Optimality and NP-Hardness of Transformers in Learning Markovian Dynamical Functions
Yanna Ding
Songtao Lu
Yingdong Lu
T. Nowicki
Jianxi Gao
118
0
0
21 Oct 2025
How Do LLMs Use Their Depth?
Akshat Gupta
Jay Yeung
Gopala Anumanchipalli
Anna Ivanova
60
0
0
21 Oct 2025
Layer Specialization Underlying Compositional Reasoning in Transformers
Jing Liu
LRM
93
0
0
20 Oct 2025
Improving Model Representation and Reducing KV Cache via Skip Connections with First Value Heads
Zhoutong Wu
Y. Zhang
Yiming Dong
Chenheng Zhang
Cong Fang
Kun Yuan
Zhouchen Lin
107
0
0
19 Oct 2025
LLM-ERM: Sample-Efficient Program Learning via LLM-Guided Search
Shivam Singhal
Eran Malach
T. Poggio
Tomer Galanti
60
0
0
16 Oct 2025
Rewiring Experts on the Fly:Continuous Rerouting for Better Online Adaptation in Mixture-of-Expert models
Guinan Su
Yanwu Yang
Li Shen
Lu Yin
Shiwei Liu
Jonas Geiping
MoE
KELM
140
1
0
16 Oct 2025
Mamba Can Learn Low-Dimensional Targets In-Context via Test-Time Feature Learning
Junsoo Oh
Wei Huang
Taiji Suzuki
156
0
0
14 Oct 2025
In-Context Learning Is Provably Bayesian Inference: A Generalization Theory for Meta-Learning
Tomoya Wakayama
Taiji Suzuki
UQCV
BDL
187
2
0
13 Oct 2025
Dimension-Free Minimax Rates for Learning Pairwise Interactions in Attention-Style Models
Shai Zucker
Xiong Wang
Fei Lu
Inbar Seroussi
96
0
0
13 Oct 2025
Softmax
≥
\geq
≥
Linear: Transformers may learn to classify in-context by kernel gradient descent
Sara Dragutinovic
Andrew Saxe
Aaditya K. Singh
MLT
104
0
0
12 Oct 2025
Design Principles for Sequence Models via Coefficient Dynamics
Jerome Sieber
Antonio Orvieto
Melanie Zeilinger
Carmen Amo Alonso
56
0
0
10 Oct 2025
Hyperspectral data augmentation with transformer-based diffusion models
Mattia Ferrari
Lorenzo Bruzzone
92
0
0
09 Oct 2025
Fine-Grained Emotion Recognition via In-Context Learning
Zhaochun Ren
Zhou Yang
Chenglong Ye
Haizhou Sun
Chao Chen
Xiaofei Zhu
Xiangwen Liao
80
0
0
08 Oct 2025
Multi-Agent Collaborative Intelligence: Dual-Dial Control for Reliable LLM Reasoning
Edward Y. Chang
Ethan Chang
68
2
0
06 Oct 2025
ContextNav: Towards Agentic Multimodal In-Context Learning
Honghao Fu
Yuan Ouyang
Kai-Wei Chang
Yiwei Wang
Zi Huang
Yujun Cai
136
0
0
06 Oct 2025
Learning Linear Regression with Low-Rank Tasks in-Context
Kaito Takanami
Takashi Takahashi
Y. Kabashima
59
0
0
06 Oct 2025
Allocation of Parameters in Transformers
Ruoxi Yu
Haotian Jiang
Jingpu Cheng
Penghao Yu
Qianxiao Li
Zhong Li
MoE
118
0
0
04 Oct 2025
Take Goodhart Seriously: Principled Limit on General-Purpose AI Optimization
Antoine Maier
Aude Maier
Tom David
48
0
0
03 Oct 2025
Compositional meta-learning through probabilistic task inference
Jacob J. W. Bakermans
Pablo Tano
Reidar Riveland
Charles Findling
Alexandre Pouget
CLL
90
0
0
02 Oct 2025
Multi-Agent Design Assistant for the Simulation of Inertial Fusion Energy
Meir H. Shachar
D. Sterbentz
Harshitha Menon
C. Jekel
M. Giselle Fernández-Godino
...
Kevin Korner
Robert Rieben
D. White
William J. Schill
Jonathan L. Belof
AI4CE
107
0
0
02 Oct 2025
Pool Me Wisely: On the Effect of Pooling in Transformer-Based Models
Sofiane Ennadir
Levente Zólyomi
Oleg Smirnov
Tianze Wang
John Pertoft
Filip Cornell
Lele Cao
76
0
0
02 Oct 2025
Local Linear Attention: An Optimal Interpolation of Linear and Softmax Attention For Test-Time Regression
Yifei Zuo
Yutong Yin
Zhichen Zeng
Ang Li
Banghua Zhu
Zhaoran Wang
116
0
0
01 Oct 2025
Theory of Scaling Laws for In-Context Regression: Depth, Width, Context and Time
Blake Bordelon
Mary I. Letey
Cengiz Pehlevan
125
0
0
01 Oct 2025
Can Mamba Learn In Context with Outliers? A Theoretical Generalization Analysis
Hongkang Li
Songtao Lu
Xiaodong Cui
Pin-Yu Chen
Meng Wang
MLT
88
1
0
01 Oct 2025
Pretrain-Test Task Alignment Governs Generalization in In-Context Learning
Mary I. Letey
Jacob A. Zavatone-Veth
Yue M. Lu
Cengiz Pehlevan
93
1
0
30 Sep 2025
TTT3R: 3D Reconstruction as Test-Time Training
Xingyu Chen
Yue Chen
Yuliang Xiu
Andreas Geiger
Anpei Chen
3DV
193
9
0
30 Sep 2025
Test time training enhances in-context learning of nonlinear functions
Kento Kuwataka
Taiji Suzuki
104
1
0
30 Sep 2025
In-Context Compositional Q-Learning for Offline Reinforcement Learning
Qiushui Xu
Yuhao Huang
Yushu Jiang
Lei Song
Jinyu Wang
Wenliang Zheng
Jiang Bian
OffRL
88
0
0
28 Sep 2025
From Harm to Help: Turning Reasoning In-Context Demos into Assets for Reasoning LMs
Haonan Wang
Weida Liang
Zihang Fu
Nie Zheng
Y. Zhang
...
Tongyao Zhu
Hao Jiang
Chuang Li
Jiaying Wu
Kenji Kawaguchi
ReLM
LRM
96
0
0
27 Sep 2025
IA2: Alignment with ICL Activations Improves Supervised Fine-Tuning
Aayush Mishra
Daniel Khashabi
Anqi Liu
116
1
0
26 Sep 2025
Statistical Advantage of Softmax Attention: Insights from Single-Location Regression
O. Duranthon
P. Marion
C. Boyer
B. Loureiro
L. Zdeborová
104
0
0
26 Sep 2025
Towards Generalizable Implicit In-Context Learning with Attention Routing
Jiaqian Li
Yanshu Li
Ligong Han
Ruixiang Tang
Wenya Wang
88
0
0
26 Sep 2025
1
2
3
4
...
8
9
10
Next