v1v2 (latest)

Transformers learn in-context by gradient descent

International Conference on Machine Learning (ICML), 2022

15 December 2022

ArXiv (abs)PDF HTML HuggingFace (1 upvotes)Github (361★)

Papers citing "Transformers learn in-context by gradient descent"

50 / 456 papers shown

Title
Equivalence of Context and Parameter Updates in Modern Transformer Blocks Adrian Goldwaser Michael Munn J. Gonzalvo Benoit Dherin 60 0 0 24 Dec 2025
Learning without training: The implicit dynamics of in-context learning Benoit Dherin Michael Munn Hanna Mazzawi Michael Wunder J. Gonzalvo ReLM OffRL LRM 382 14 0 24 Dec 2025
The brain-AI convergence: Predictive and generative world models for general-purpose computation Shogo Ohmae Keiko Ohmae 16 0 0 02 Dec 2025
Semantic Anchors in In-Context Learning: Why Small LLMs Cannot Flip Their Labels Anantha Padmanaban Krishna Kumar 120 0 0 26 Nov 2025
Transformers with RL or SFT Provably Learn Sparse Boolean Functions, But Differently Bochen Lyu Yiyang Jia Xiaohao Cai Zhanxing Zhu MoE 112 0 0 22 Nov 2025
Implicit Federated In-context Learning For Task-Specific LLM Fine-Tuning Dongcheng Li Junhan Chen Aoxiang Zhou Chunpei Li Youquan Xian Peng Liu Xianxian Li FedML 298 0 0 10 Nov 2025
Robust Experimental Design via Generalised Bayesian Inference Yasir Zubayr Barlas Sabina J. Sloman Samuel Kaski 100 0 0 10 Nov 2025
Scaling Laws and In-Context Learning: A Unified Theoretical Framework Sushant Mehta Ishan Gupta 81 0 0 09 Nov 2025
Optimal Attention Temperature Enhances In-Context Learning under Distribution Shift Samet Demir Zafer Dogan 104 0 0 03 Nov 2025
On the Emergence of Induction Heads for In-Context Learning Tiberiu Musat Tiago Pimentel Lorenzo Noci Alessandro Stolfo Mrinmaya Sachan Thomas Hofmann 100 0 0 02 Nov 2025
Belief Dynamics Reveal the Dual Nature of In-Context Learning and Activation Steering Eric J. Bigelow Daniel Wurgaft YingQiao Wang Noah D. Goodman T. Ullman Hidenori Tanaka Ekdeep Singh Lubana LLMSV 120 0 0 01 Nov 2025
Detecting Data Contamination in LLMs via In-Context Learning Michał Zawalski Meriem Boubdir Klaudia Bałazy Besmira Nushi Pablo Ribalta 129 0 0 30 Oct 2025
How Data Mixing Shapes In-Context Learning: Asymptotic Equivalence for Transformers with MLPs Samet Demir Zafer Dogan 89 0 0 29 Oct 2025
Understanding Multi-View Transformers Michal Stary Julien Gaubil A. Tewari Vincent Sitzmann ViT 76 0 0 28 Oct 2025
Provable test-time adaptivity and distributional robustness of in-context learning Tianyi Ma Tengyao Wang R. Samworth 104 1 0 27 Oct 2025
Can Language Models Compose Skills In-Context? Zidong Liu Zhuoyan Xu Zhenmei Shi Yingyu Liang ReLM CoGe LRM 227 0 0 27 Oct 2025
A Framework for Quantifying How Pre-Training and Context Benefit In-Context Learning Bingqing Song Jiaxiang Li Rong Wang Songtao Lu Mingyi Hong 100 0 0 26 Oct 2025
Enabling Robust In-Context Memory and Rapid Task Adaptation in Transformers with Hebbian and Gradient-Based Plasticity Siddharth Chaudhary 134 0 0 24 Oct 2025
Large Language Models as Model Organisms for Human Associative Learning Camila Kolling Vy A. Vo Mariya Toneva KELM 176 0 0 24 Oct 2025
Do Prompts Reshape Representations? An Empirical Study of Prompting Effects on Embeddings Cesar Gonzalez-Gutierrez Dirk Hovy 112 0 0 22 Oct 2025
Transformers are almost optimal metalearners for linear classification Roey Magen Gal Vardi 116 0 0 22 Oct 2025
How Do LLMs Use Their Depth? Akshat Gupta Jay Yeung Gopala Anumanchipalli Anna Ivanova 64 0 0 21 Oct 2025
Optimality and NP-Hardness of Transformers in Learning Markovian Dynamical Functions Yanna Ding Songtao Lu Yingdong Lu T. Nowicki Jianxi Gao 184 0 0 21 Oct 2025
Layer Specialization Underlying Compositional Reasoning in Transformers Jing Liu LRM 113 0 0 20 Oct 2025
Improving Model Representation and Reducing KV Cache via Skip Connections with First Value Heads Zhoutong Wu Y. Zhang Yiming Dong Chenheng Zhang Cong Fang Kun Yuan Zhouchen Lin 127 0 0 19 Oct 2025
LLM-ERM: Sample-Efficient Program Learning via LLM-Guided Search Shivam Singhal Eran Malach T. Poggio Tomer Galanti 72 0 0 16 Oct 2025
Rewiring Experts on the Fly:Continuous Rerouting for Better Online Adaptation in Mixture-of-Expert models Guinan Su Yanwu Yang Li Shen Lu Yin Shiwei Liu Jonas Geiping MoE KELM 156 1 0 16 Oct 2025
Mamba Can Learn Low-Dimensional Targets In-Context via Test-Time Feature Learning Junsoo Oh Wei Huang Taiji Suzuki 188 0 0 14 Oct 2025
In-Context Learning Is Provably Bayesian Inference: A Generalization Theory for Meta-Learning Tomoya Wakayama Taiji Suzuki UQCV BDL 267 2 0 13 Oct 2025
Dimension-Free Minimax Rates for Learning Pairwise Interactions in Attention-Style Models Shai Zucker Xiong Wang Fei Lu Inbar Seroussi 104 0 0 13 Oct 2025
$Softmax $\geq$ Linear: Transformers may learn to classify in-context by kernel gradient descent$ Softmax $\geq$ Linear: Transformers may learn to classify in-context by kernel gradient descent Sara Dragutinovic Andrew Saxe Aaditya K. Singh MLT 124 1 0 12 Oct 2025
Design Principles for Sequence Models via Coefficient Dynamics Jerome Sieber Antonio Orvieto Melanie Zeilinger Carmen Amo Alonso 68 0 0 10 Oct 2025
Hyperspectral data augmentation with transformer-based diffusion models Mattia Ferrari Lorenzo Bruzzone 108 0 0 09 Oct 2025
Fine-Grained Emotion Recognition via In-Context Learning Zhaochun Ren Zhou Yang Chenglong Ye Haizhou Sun Chao Chen Xiaofei Zhu Xiangwen Liao 92 0 0 08 Oct 2025
Learning Linear Regression with Low-Rank Tasks in-Context Kaito Takanami Takashi Takahashi Y. Kabashima 95 0 0 06 Oct 2025
ContextNav: Towards Agentic Multimodal In-Context Learning Honghao Fu Yuan Ouyang Kai-Wei Chang Yiwei Wang Zi Huang Yujun Cai 160 0 0 06 Oct 2025
Multi-Agent Collaborative Intelligence: Dual-Dial Control for Reliable LLM Reasoning Edward Y. Chang Ethan Chang 76 2 0 06 Oct 2025
Allocation of Parameters in Transformers Ruoxi Yu Haotian Jiang Jingpu Cheng Penghao Yu Qianxiao Li Zhong Li MoE 130 0 0 04 Oct 2025
Take Goodhart Seriously: Principled Limit on General-Purpose AI Optimization Antoine Maier Aude Maier Tom David 88 0 0 03 Oct 2025
Multi-Agent Design Assistant for the Simulation of Inertial Fusion Energy Meir H. Shachar D. Sterbentz Harshitha Menon C. Jekel M. Giselle Fernández-Godino ... Kevin Korner Robert Rieben D. White William J. Schill Jonathan L. Belof AI4CE 155 0 0 02 Oct 2025
Pool Me Wisely: On the Effect of Pooling in Transformer-Based Models Sofiane Ennadir Levente Zólyomi Oleg Smirnov Tianze Wang John Pertoft Filip Cornell Lele Cao 100 0 0 02 Oct 2025
Compositional meta-learning through probabilistic task inference Jacob J. W. Bakermans Pablo Tano Reidar Riveland Charles Findling Alexandre Pouget CLL 110 0 0 02 Oct 2025
Can Mamba Learn In Context with Outliers? A Theoretical Generalization Analysis Hongkang Li Songtao Lu Xiaodong Cui Pin-Yu Chen Meng Wang MLT 108 1 0 01 Oct 2025
Theory of Scaling Laws for In-Context Regression: Depth, Width, Context and Time Blake Bordelon Mary I. Letey Cengiz Pehlevan 145 0 0 01 Oct 2025
Local Linear Attention: An Optimal Interpolation of Linear and Softmax Attention For Test-Time Regression Yifei Zuo Yutong Yin Zhichen Zeng Ang Li Banghua Zhu Zhaoran Wang 128 0 0 01 Oct 2025
Pretrain-Test Task Alignment Governs Generalization in In-Context Learning Mary I. Letey Jacob A. Zavatone-Veth Yue M. Lu Cengiz Pehlevan 117 1 0 30 Sep 2025
Test time training enhances in-context learning of nonlinear functions Kento Kuwataka Taiji Suzuki 132 1 0 30 Sep 2025
TTT3R: 3D Reconstruction as Test-Time Training Xingyu Chen Yue Chen Yuliang Xiu Andreas Geiger Anpei Chen 3DV 209 12 0 30 Sep 2025
In-Context Compositional Q-Learning for Offline Reinforcement Learning Qiushui Xu Yuhao Huang Yushu Jiang Lei Song Jinyu Wang Wenliang Zheng Jiang Bian OffRL 100 0 0 28 Sep 2025
From Harm to Help: Turning Reasoning In-Context Demos into Assets for Reasoning LMs Haonan Wang Weida Liang Zihang Fu Nie Zheng Y. Zhang ... Tongyao Zhu Hao Jiang Chuang Li Jiaying Wu Kenji Kawaguchi ReLM LRM 128 0 0 27 Sep 2025