v1v2 (latest)

Transformers learn in-context by gradient descent

International Conference on Machine Learning (ICML), 2022

15 December 2022

ArXiv (abs)PDF HTML HuggingFace (1 upvotes)Github (361★)

Papers citing "Transformers learn in-context by gradient descent"

50 / 453 papers shown

Title
Semantic Anchors in In-Context Learning: Why Small LLMs Cannot Flip Their Labels Anantha Padmanaban Krishna Kumar 104 0 0 26 Nov 2025
Transformers with RL or SFT Provably Learn Sparse Boolean Functions, But Differently Bochen Lyu Yiyang Jia Xiaohao Cai Zhanxing Zhu MoE 72 0 0 22 Nov 2025
Implicit Federated In-context Learning For Task-Specific LLM Fine-Tuning Dongcheng Li Junhan Chen Aoxiang Zhou Chunpei Li Youquan Xian Peng Liu Xianxian Li FedML 218 0 0 10 Nov 2025
Robust Experimental Design via Generalised Bayesian Inference Yasir Zubayr Barlas Sabina J. Sloman Samuel Kaski 72 0 0 10 Nov 2025
Scaling Laws and In-Context Learning: A Unified Theoretical Framework Sushant Mehta Ishan Gupta 69 0 0 09 Nov 2025
Optimal Attention Temperature Enhances In-Context Learning under Distribution Shift Samet Demir Zafer Dogan 88 0 0 03 Nov 2025
On the Emergence of Induction Heads for In-Context Learning Tiberiu Musat Tiago Pimentel Lorenzo Noci Alessandro Stolfo Mrinmaya Sachan Thomas Hofmann 88 0 0 02 Nov 2025
Belief Dynamics Reveal the Dual Nature of In-Context Learning and Activation Steering Eric J. Bigelow Daniel Wurgaft YingQiao Wang Noah D. Goodman T. Ullman Hidenori Tanaka Ekdeep Singh Lubana LLMSV 100 0 0 01 Nov 2025
Detecting Data Contamination in LLMs via In-Context Learning Michał Zawalski Meriem Boubdir Klaudia Bałazy Besmira Nushi Pablo Ribalta 113 0 0 30 Oct 2025
How Data Mixing Shapes In-Context Learning: Asymptotic Equivalence for Transformers with MLPs Samet Demir Zafer Dogan 73 0 0 29 Oct 2025
Understanding Multi-View Transformers Michal Stary Julien Gaubil A. Tewari Vincent Sitzmann ViT 60 0 0 28 Oct 2025
Provable test-time adaptivity and distributional robustness of in-context learning Tianyi Ma Tengyao Wang R. Samworth 84 1 0 27 Oct 2025
Can Language Models Compose Skills In-Context? Zidong Liu Zhuoyan Xu Zhenmei Shi Yingyu Liang ReLM CoGe LRM 199 0 0 27 Oct 2025
A Framework for Quantifying How Pre-Training and Context Benefit In-Context Learning Bingqing Song Jiaxiang Li Rong Wang Songtao Lu Mingyi Hong 60 0 0 26 Oct 2025
Enabling Robust In-Context Memory and Rapid Task Adaptation in Transformers with Hebbian and Gradient-Based Plasticity Siddharth Chaudhary 118 0 0 24 Oct 2025
Large Language Models as Model Organisms for Human Associative Learning Camila Kolling Vy A. Vo Mariya Toneva KELM 160 0 0 24 Oct 2025
Transformers are almost optimal metalearners for linear classification Roey Magen Gal Vardi 104 0 0 22 Oct 2025
Do Prompts Reshape Representations? An Empirical Study of Prompting Effects on Embeddings Cesar Gonzalez-Gutierrez Dirk Hovy 100 0 0 22 Oct 2025
Optimality and NP-Hardness of Transformers in Learning Markovian Dynamical Functions Yanna Ding Songtao Lu Yingdong Lu T. Nowicki Jianxi Gao 118 0 0 21 Oct 2025
How Do LLMs Use Their Depth? Akshat Gupta Jay Yeung Gopala Anumanchipalli Anna Ivanova 60 0 0 21 Oct 2025
Layer Specialization Underlying Compositional Reasoning in Transformers Jing Liu LRM 93 0 0 20 Oct 2025
Improving Model Representation and Reducing KV Cache via Skip Connections with First Value Heads Zhoutong Wu Y. Zhang Yiming Dong Chenheng Zhang Cong Fang Kun Yuan Zhouchen Lin 107 0 0 19 Oct 2025
LLM-ERM: Sample-Efficient Program Learning via LLM-Guided Search Shivam Singhal Eran Malach T. Poggio Tomer Galanti 60 0 0 16 Oct 2025
Rewiring Experts on the Fly:Continuous Rerouting for Better Online Adaptation in Mixture-of-Expert models Guinan Su Yanwu Yang Li Shen Lu Yin Shiwei Liu Jonas Geiping MoE KELM 140 1 0 16 Oct 2025
Mamba Can Learn Low-Dimensional Targets In-Context via Test-Time Feature Learning Junsoo Oh Wei Huang Taiji Suzuki 156 0 0 14 Oct 2025
In-Context Learning Is Provably Bayesian Inference: A Generalization Theory for Meta-Learning Tomoya Wakayama Taiji Suzuki UQCV BDL 187 2 0 13 Oct 2025
Dimension-Free Minimax Rates for Learning Pairwise Interactions in Attention-Style Models Shai Zucker Xiong Wang Fei Lu Inbar Seroussi 96 0 0 13 Oct 2025
$Softmax $\geq$ Linear: Transformers may learn to classify in-context by kernel gradient descent$ Softmax $\geq$ Linear: Transformers may learn to classify in-context by kernel gradient descent Sara Dragutinovic Andrew Saxe Aaditya K. Singh MLT 104 0 0 12 Oct 2025
Design Principles for Sequence Models via Coefficient Dynamics Jerome Sieber Antonio Orvieto Melanie Zeilinger Carmen Amo Alonso 56 0 0 10 Oct 2025
Hyperspectral data augmentation with transformer-based diffusion models Mattia Ferrari Lorenzo Bruzzone 92 0 0 09 Oct 2025
Fine-Grained Emotion Recognition via In-Context Learning Zhaochun Ren Zhou Yang Chenglong Ye Haizhou Sun Chao Chen Xiaofei Zhu Xiangwen Liao 80 0 0 08 Oct 2025
Multi-Agent Collaborative Intelligence: Dual-Dial Control for Reliable LLM Reasoning Edward Y. Chang Ethan Chang 68 2 0 06 Oct 2025
ContextNav: Towards Agentic Multimodal In-Context Learning Honghao Fu Yuan Ouyang Kai-Wei Chang Yiwei Wang Zi Huang Yujun Cai 136 0 0 06 Oct 2025
Learning Linear Regression with Low-Rank Tasks in-Context Kaito Takanami Takashi Takahashi Y. Kabashima 59 0 0 06 Oct 2025
Allocation of Parameters in Transformers Ruoxi Yu Haotian Jiang Jingpu Cheng Penghao Yu Qianxiao Li Zhong Li MoE 118 0 0 04 Oct 2025
Take Goodhart Seriously: Principled Limit on General-Purpose AI Optimization Antoine Maier Aude Maier Tom David 48 0 0 03 Oct 2025
Compositional meta-learning through probabilistic task inference Jacob J. W. Bakermans Pablo Tano Reidar Riveland Charles Findling Alexandre Pouget CLL 90 0 0 02 Oct 2025
Multi-Agent Design Assistant for the Simulation of Inertial Fusion Energy Meir H. Shachar D. Sterbentz Harshitha Menon C. Jekel M. Giselle Fernández-Godino ... Kevin Korner Robert Rieben D. White William J. Schill Jonathan L. Belof AI4CE 107 0 0 02 Oct 2025
Pool Me Wisely: On the Effect of Pooling in Transformer-Based Models Sofiane Ennadir Levente Zólyomi Oleg Smirnov Tianze Wang John Pertoft Filip Cornell Lele Cao 76 0 0 02 Oct 2025
Local Linear Attention: An Optimal Interpolation of Linear and Softmax Attention For Test-Time Regression Yifei Zuo Yutong Yin Zhichen Zeng Ang Li Banghua Zhu Zhaoran Wang 116 0 0 01 Oct 2025
Theory of Scaling Laws for In-Context Regression: Depth, Width, Context and Time Blake Bordelon Mary I. Letey Cengiz Pehlevan 125 0 0 01 Oct 2025
Can Mamba Learn In Context with Outliers? A Theoretical Generalization Analysis Hongkang Li Songtao Lu Xiaodong Cui Pin-Yu Chen Meng Wang MLT 88 1 0 01 Oct 2025
Pretrain-Test Task Alignment Governs Generalization in In-Context Learning Mary I. Letey Jacob A. Zavatone-Veth Yue M. Lu Cengiz Pehlevan 93 1 0 30 Sep 2025
TTT3R: 3D Reconstruction as Test-Time Training Xingyu Chen Yue Chen Yuliang Xiu Andreas Geiger Anpei Chen 3DV 193 9 0 30 Sep 2025
Test time training enhances in-context learning of nonlinear functions Kento Kuwataka Taiji Suzuki 104 1 0 30 Sep 2025
In-Context Compositional Q-Learning for Offline Reinforcement Learning Qiushui Xu Yuhao Huang Yushu Jiang Lei Song Jinyu Wang Wenliang Zheng Jiang Bian OffRL 88 0 0 28 Sep 2025
From Harm to Help: Turning Reasoning In-Context Demos into Assets for Reasoning LMs Haonan Wang Weida Liang Zihang Fu Nie Zheng Y. Zhang ... Tongyao Zhu Hao Jiang Chuang Li Jiaying Wu Kenji Kawaguchi ReLM LRM 96 0 0 27 Sep 2025
IA2: Alignment with ICL Activations Improves Supervised Fine-Tuning Aayush Mishra Daniel Khashabi Anqi Liu 116 1 0 26 Sep 2025
Statistical Advantage of Softmax Attention: Insights from Single-Location Regression O. Duranthon P. Marion C. Boyer B. Loureiro L. Zdeborová 104 0 0 26 Sep 2025
Towards Generalizable Implicit In-Context Learning with Attention Routing Jiaqian Li Yanshu Li Ligong Han Ruixiang Tang Wenya Wang 88 0 0 26 Sep 2025