v1v2 (latest)

Transformers learn in-context by gradient descent

International Conference on Machine Learning (ICML), 2022

15 December 2022

ArXiv (abs)PDF HTML HuggingFace (1 upvotes)Github (361★)

Papers citing "Transformers learn in-context by gradient descent"

50 / 453 papers shown

Title
Statistical Advantage of Softmax Attention: Insights from Single-Location Regression O. Duranthon P. Marion C. Boyer B. Loureiro L. Zdeborová 88 0 0 26 Sep 2025
Towards Generalizable Implicit In-Context Learning with Attention Routing Jiaqian Li Yanshu Li Ligong Han Ruixiang Tang Wenya Wang 80 0 0 26 Sep 2025
On Theoretical Interpretations of Concept-Based In-Context Learning Huaze Tang Tianren Peng Shao-Lun Huang 137 0 0 25 Sep 2025
A circuit for predicting hierarchical structure in-context in Large Language Models Tankred Saanum Can Demircan Samuel Gershman Eric Schulz 84 0 0 25 Sep 2025
Linear Transformers Implicitly Discover Unified Numerical Algorithms Patrick Lutz Aditya Gangrade Hadi Daneshmand Venkatesh Saligrama 40 0 0 24 Sep 2025
Asymptotic Study of In-context Learning with Random Transformers through Equivalent Models Samet Demir Zafer Dogan 80 2 0 18 Sep 2025
Selective Induction Heads: How Transformers Select Causal Structures In ContextInternational Conference on Learning Representations (ICLR), 2025 Francesco DÁngelo Francesco Croce Nicolas Flammarion 76 4 0 09 Sep 2025
InSQuAD: In-Context Learning for Efficient Retrieval via Submodular Mutual Information to Enforce Quality and Diversity Souradeep Nanda Anay Majee Rishabh K. Iyer 59 0 0 28 Aug 2025
Just-in-time and distributed task representations in language models Yuxuan Li Declan Campbell Stephanie Chan Andrew Kyle Lampinen 172 1 0 28 Aug 2025
Fast weight programming and linear transformers: from machine learning to neurobiology Kazuki Irie Samuel J. Gershman 104 0 0 11 Aug 2025
Towards Theoretical Understanding of Transformer Test-Time Computing: Investigation on In-Context Linear Regression Xingwu Chen Miao Lu Beining Wu Difan Zou 113 0 0 11 Aug 2025
From Text to Trajectories: GPT-2 as an ODE Solver via In-Context Ziyang Ma Baojian Zhou Deqing Yang Yanghua Xiao 96 0 0 05 Aug 2025
Transformers in Pseudo-Random Number Generation: A Dual Perspective on Theory and Practice Ran Li Lingshu Zeng 94 0 0 02 Aug 2025
Provable In-Context Learning of Nonlinear Regression with Transformers Hongbo Li Lingjie Duan Yingbin Liang 119 1 0 28 Jul 2025
Towards Compute-Optimal Many-Shot In-Context Learning Shahriar Golchin Yanfei Chen Rujun Han Manan Gandhi Tianli Yu Swaroop Mishra Mihai Surdeanu Rishabh Agarwal Chen-Yu Lee Tomas Pfister 119 0 0 22 Jul 2025
Learning without training: The implicit dynamics of in-context learning Benoit Dherin Michael Munn Hanna Mazzawi Michael Wunder J. Gonzalvo ReLM OffRL LRM 152 12 0 21 Jul 2025
Provable Low-Frequency Bias of In-Context Learning of Representations Yongyi Yang Hidenori Tanaka Wei Hu 174 0 0 17 Jul 2025
CooT: Learning to Coordinate In-Context with Coordination Transformers Huai-Chih Wang Hsiang-Chun Chuang Hsi-Chun Cheng Dai-Jie Wu Shao-Hua Sun OffRL 93 0 0 30 Jun 2025
Latent Concept Disentanglement in Transformer-based Language Models Guan Zhe Hong Bhavya Vasudeva Willie Neiswanger Cyrus Rashtchian Prabhakar Raghavan Rina Panigrahy ReLM LRM 271 2 0 20 Jun 2025
Next-Token Prediction Should be Ambiguity-Sensitive: A Meta-Learning Perspective Léo Gagnon Eric Elmoznino Sarthak Mittal Tom Marty Tejas Kasetty Dhanya Sridhar Guillaume Lajoie 183 0 0 19 Jun 2025
When and How Unlabeled Data Provably Improve In-Context Learning Yingcong Li Xiangyu Chang Muti Kara Xiaofeng Liu Amit K. Roy-Chowdhury Samet Oymak 165 1 0 18 Jun 2025
Brewing Knowledge in Context: Distillation Perspectives on In-Context Learning Chengye Li Haiyun Liu Yuanxi Li 185 0 0 13 Jun 2025
Understanding In-Context Learning on Structured Manifolds: Bridging Attention to Kernel Methods Zhaiming Shen Alexander Hsu Rongjie Lai Wenjing Liao MLT 270 2 0 12 Jun 2025
Understanding Task Vectors in In-Context Learning: Emergence, Functionality, and Limitations Yuxin Dong Jiachen Jiang Zhihui Zhu Xia Ning 150 3 0 10 Jun 2025
On Finetuning Tabular Foundation Models Ivan Rubachev Akim Kotelnikov Nikolay Kartashev Artem Babenko 186 4 0 10 Jun 2025
CausalPFN: Amortized Causal Effect Estimation via In-Context Learning Vahid Balazadeh Hamidreza Kamkari Valentin Thomas Benson Li Junwei Ma Jesse C. Cresswell Rahul G. Krishnan CML 146 5 0 09 Jun 2025
Federated In-Context Learning: Iterative Refinement for Improved Answer Quality Ruhan Wang Zhiyong Wang Chengkai Huang Rui Wang Tong Yu Lina Yao John C. S. Lui Dongruo Zhou 136 2 0 09 Jun 2025
Can Biologically Plausible Temporal Credit Assignment Rules Match BPTT for Neural Similarity? E-prop as an Example Yuhan Helena Liu Guangyu Robert Yang Christopher J. Cueva 191 0 0 07 Jun 2025
Alternating Gradient Flows: A Theory of Feature Learning in Two-layer Neural Networks D. Kunin Giovanni Luca Marchetti F. Chen Dhruva Karkada James B. Simon M. DeWeese Surya Ganguli Nina Miolane 273 3 0 06 Jun 2025
Contextually Guided Transformers via Low-Rank Adaptation A. Zhmoginov Jihwan Lee Max Vladymyrov Mark Sandler OffRL 158 0 0 06 Jun 2025
MesaNet: Sequence Modeling by Locally Optimal Test-Time Training J. Oswald Nino Scherrer Seijin Kobayashi Luca Versari Songlin Yang ... Guillaume Lajoie Charlotte Frenkel Razvan Pascanu Blaise Agüera y Arcas João Sacramento 225 12 0 05 Jun 2025
Counterfactual reasoning: an analysis of in-context emergence Moritz Miller Bernhard Schölkopf Siyuan Guo ReLM LRM 311 0 0 05 Jun 2025
Sample Complexity and Representation Ability of Test-time Scaling Paradigms Baihe Huang Shanda Li Tianhao Wu Yiming Yang Ameet Talwalkar Kannan Ramchandran Michael I. Jordan Jiantao Jiao LRM 299 1 0 05 Jun 2025
When can in-context learning generalize out of task distribution? Chase Goddard Lindsay M. Smith Vudtiwat Ngampruetikorn David J. Schwab OOD 102 3 0 05 Jun 2025
A Generative Adaptive Replay Continual Learning Model for Temporal Knowledge Graph ReasoningAnnual Meeting of the Association for Computational Linguistics (ACL), 2025 Zhiyu Zhang Wei Chen Youfang Lin Huaiyu Wan OffRL CLL 331 1 0 04 Jun 2025
Relational reasoning and inductive bias in transformers trained on a transitive inference task J. Geerts Stephanie Chan Claudia Clopath Kimberly L. Stachenfeld LRM 146 2 0 04 Jun 2025
Transformers as Multi-task Learners: Decoupling Features in Hidden Markov Models Yifan Hao Chenlu Ye Chi Han Tong Zhang 169 0 0 02 Jun 2025
The Unified Cognitive Consciousness Theory for Language Models: Anchoring Semantics, Thresholds of Activation, and Emergent Reasoning Edward Y. Chang Zeyneb N. Kaya Ethan Chang LRM 257 0 0 02 Jun 2025
Weight-Space Linear Recurrent Neural Networks Roussel Desmond Nzoyem Nawid Keshtmand Enrique Crespo Fernandez Idriss Tsayem Raúl Santos-Rodríguez David A.W. Barton Tom Deakin 256 0 0 01 Jun 2025
From Parameters to Prompts: Understanding and Mitigating the Factuality Gap between Fine-Tuned LLMs Xuan Gong Hanbo Huang Shiyu Liang 177 0 0 29 May 2025
Neither Stochastic Parroting nor AGI: LLMs Solve Tasks through Context-Directed Extrapolation from Training Data Priors Harish Tayyar Madabushi Melissa Torgbi C. Bonial 283 3 0 29 May 2025
The Role of Diversity in In-Context Learning for Large Language Models Wenyang Xiao Haoyu Zhao Lingxiao Huang 301 1 0 26 May 2025
Optimization-Inspired Few-Shot Adaptation for Large Language Models Boyan Gao Xin Wang Jianlong Wu David A. Clifton 216 0 0 25 May 2025
Multi-Scale Manifold Alignment for Interpreting Large Language Models: A Unified Information-Geometric Framework Yukun Zhang Qi Dong 101 0 0 24 May 2025
Understanding Prompt Tuning and In-Context Learning via Meta-Learning Tim Genewein Kevin Wenliang Li Jordi Grau-Moya Anian Ruoss Laurent Orseau Marcus Hutter VPVLM 282 2 0 22 May 2025
Only Large Weights (And Not Skip Connections) Can Prevent the Perils of Rank Collapse Josh Alman Zhao Song 287 9 0 22 May 2025
From Compression to Expression: A Layerwise Analysis of In-Context Learning Jiachen Jiang Yuxin Dong Jinxin Zhou Zhihui Zhu 129 2 0 22 May 2025
Beyond Induction Heads: In-Context Meta Learning Induces Multi-Phase Circuit Emergence Gouki Minegishi Hiroki Furuta Shohei Taniguchi Yusuke Iwasawa Yutaka Matsuo 300 6 0 22 May 2025
Joint Flashback Adaptation for Forgetting-Resistant Instruction Tuning Yukun Zhao Lingyong Yan Zhenyang Li Shuaiqiang Wang Zhumin Chen Zhaochun Ren Dawei Yin CLL KELM VLM LRM 198 0 0 21 May 2025
How Transformers Learn In-Context Recall Tasks? Optimality, Training Dynamics and Generalization Quan Nguyen Thanh Nguyen-Tang MLT 286 1 0 21 May 2025