ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2212.07677
  4. Cited By
Transformers learn in-context by gradient descent
v1v2 (latest)

Transformers learn in-context by gradient descent

International Conference on Machine Learning (ICML), 2022
15 December 2022
J. Oswald
Eyvind Niklasson
E. Randazzo
João Sacramento
A. Mordvintsev
A. Zhmoginov
Max Vladymyrov
    MLT
ArXiv (abs)PDFHTMLHuggingFace (1 upvotes)Github (361★)

Papers citing "Transformers learn in-context by gradient descent"

50 / 453 papers shown
Title
Statistical Advantage of Softmax Attention: Insights from Single-Location Regression
Statistical Advantage of Softmax Attention: Insights from Single-Location Regression
O. Duranthon
P. Marion
C. Boyer
B. Loureiro
L. Zdeborová
88
0
0
26 Sep 2025
Towards Generalizable Implicit In-Context Learning with Attention Routing
Towards Generalizable Implicit In-Context Learning with Attention Routing
Jiaqian Li
Yanshu Li
Ligong Han
Ruixiang Tang
Wenya Wang
80
0
0
26 Sep 2025
On Theoretical Interpretations of Concept-Based In-Context Learning
On Theoretical Interpretations of Concept-Based In-Context Learning
Huaze Tang
Tianren Peng
Shao-Lun Huang
137
0
0
25 Sep 2025
A circuit for predicting hierarchical structure in-context in Large Language Models
A circuit for predicting hierarchical structure in-context in Large Language Models
Tankred Saanum
Can Demircan
Samuel Gershman
Eric Schulz
84
0
0
25 Sep 2025
Linear Transformers Implicitly Discover Unified Numerical Algorithms
Linear Transformers Implicitly Discover Unified Numerical Algorithms
Patrick Lutz
Aditya Gangrade
Hadi Daneshmand
Venkatesh Saligrama
40
0
0
24 Sep 2025
Asymptotic Study of In-context Learning with Random Transformers through Equivalent Models
Asymptotic Study of In-context Learning with Random Transformers through Equivalent Models
Samet Demir
Zafer Dogan
80
2
0
18 Sep 2025
Selective Induction Heads: How Transformers Select Causal Structures In Context
Selective Induction Heads: How Transformers Select Causal Structures In ContextInternational Conference on Learning Representations (ICLR), 2025
Francesco DÁngelo
Francesco Croce
Nicolas Flammarion
76
4
0
09 Sep 2025
InSQuAD: In-Context Learning for Efficient Retrieval via Submodular Mutual Information to Enforce Quality and Diversity
InSQuAD: In-Context Learning for Efficient Retrieval via Submodular Mutual Information to Enforce Quality and Diversity
Souradeep Nanda
Anay Majee
Rishabh K. Iyer
59
0
0
28 Aug 2025
Just-in-time and distributed task representations in language models
Just-in-time and distributed task representations in language models
Yuxuan Li
Declan Campbell
Stephanie Chan
Andrew Kyle Lampinen
172
1
0
28 Aug 2025
Fast weight programming and linear transformers: from machine learning to neurobiology
Fast weight programming and linear transformers: from machine learning to neurobiology
Kazuki Irie
Samuel J. Gershman
104
0
0
11 Aug 2025
Towards Theoretical Understanding of Transformer Test-Time Computing: Investigation on In-Context Linear Regression
Towards Theoretical Understanding of Transformer Test-Time Computing: Investigation on In-Context Linear Regression
Xingwu Chen
Miao Lu
Beining Wu
Difan Zou
113
0
0
11 Aug 2025
From Text to Trajectories: GPT-2 as an ODE Solver via In-Context
From Text to Trajectories: GPT-2 as an ODE Solver via In-Context
Ziyang Ma
Baojian Zhou
Deqing Yang
Yanghua Xiao
96
0
0
05 Aug 2025
Transformers in Pseudo-Random Number Generation: A Dual Perspective on Theory and Practice
Transformers in Pseudo-Random Number Generation: A Dual Perspective on Theory and Practice
Ran Li
Lingshu Zeng
94
0
0
02 Aug 2025
Provable In-Context Learning of Nonlinear Regression with Transformers
Provable In-Context Learning of Nonlinear Regression with Transformers
Hongbo Li
Lingjie Duan
Yingbin Liang
119
1
0
28 Jul 2025
Towards Compute-Optimal Many-Shot In-Context Learning
Towards Compute-Optimal Many-Shot In-Context Learning
Shahriar Golchin
Yanfei Chen
Rujun Han
Manan Gandhi
Tianli Yu
Swaroop Mishra
Mihai Surdeanu
Rishabh Agarwal
Chen-Yu Lee
Tomas Pfister
119
0
0
22 Jul 2025
Learning without training: The implicit dynamics of in-context learning
Learning without training: The implicit dynamics of in-context learning
Benoit Dherin
Michael Munn
Hanna Mazzawi
Michael Wunder
J. Gonzalvo
ReLMOffRLLRM
152
12
0
21 Jul 2025
Provable Low-Frequency Bias of In-Context Learning of Representations
Provable Low-Frequency Bias of In-Context Learning of Representations
Yongyi Yang
Hidenori Tanaka
Wei Hu
174
0
0
17 Jul 2025
CooT: Learning to Coordinate In-Context with Coordination Transformers
CooT: Learning to Coordinate In-Context with Coordination Transformers
Huai-Chih Wang
Hsiang-Chun Chuang
Hsi-Chun Cheng
Dai-Jie Wu
Shao-Hua Sun
OffRL
93
0
0
30 Jun 2025
Latent Concept Disentanglement in Transformer-based Language Models
Latent Concept Disentanglement in Transformer-based Language Models
Guan Zhe Hong
Bhavya Vasudeva
Willie Neiswanger
Cyrus Rashtchian
Prabhakar Raghavan
Rina Panigrahy
ReLMLRM
271
2
0
20 Jun 2025
Next-Token Prediction Should be Ambiguity-Sensitive: A Meta-Learning Perspective
Next-Token Prediction Should be Ambiguity-Sensitive: A Meta-Learning Perspective
Léo Gagnon
Eric Elmoznino
Sarthak Mittal
Tom Marty
Tejas Kasetty
Dhanya Sridhar
Guillaume Lajoie
183
0
0
19 Jun 2025
When and How Unlabeled Data Provably Improve In-Context Learning
When and How Unlabeled Data Provably Improve In-Context Learning
Yingcong Li
Xiangyu Chang
Muti Kara
Xiaofeng Liu
Amit K. Roy-Chowdhury
Samet Oymak
165
1
0
18 Jun 2025
Brewing Knowledge in Context: Distillation Perspectives on In-Context Learning
Brewing Knowledge in Context: Distillation Perspectives on In-Context Learning
Chengye Li
Haiyun Liu
Yuanxi Li
185
0
0
13 Jun 2025
Understanding In-Context Learning on Structured Manifolds: Bridging Attention to Kernel Methods
Understanding In-Context Learning on Structured Manifolds: Bridging Attention to Kernel Methods
Zhaiming Shen
Alexander Hsu
Rongjie Lai
Wenjing Liao
MLT
270
2
0
12 Jun 2025
Understanding Task Vectors in In-Context Learning: Emergence, Functionality, and Limitations
Yuxin Dong
Jiachen Jiang
Zhihui Zhu
Xia Ning
150
3
0
10 Jun 2025
On Finetuning Tabular Foundation Models
Ivan Rubachev
Akim Kotelnikov
Nikolay Kartashev
Artem Babenko
186
4
0
10 Jun 2025
CausalPFN: Amortized Causal Effect Estimation via In-Context Learning
CausalPFN: Amortized Causal Effect Estimation via In-Context Learning
Vahid Balazadeh
Hamidreza Kamkari
Valentin Thomas
Benson Li
Junwei Ma
Jesse C. Cresswell
Rahul G. Krishnan
CML
146
5
0
09 Jun 2025
Federated In-Context Learning: Iterative Refinement for Improved Answer Quality
Federated In-Context Learning: Iterative Refinement for Improved Answer Quality
Ruhan Wang
Zhiyong Wang
Chengkai Huang
Rui Wang
Tong Yu
Lina Yao
John C. S. Lui
Dongruo Zhou
136
2
0
09 Jun 2025
Can Biologically Plausible Temporal Credit Assignment Rules Match BPTT for Neural Similarity? E-prop as an Example
Can Biologically Plausible Temporal Credit Assignment Rules Match BPTT for Neural Similarity? E-prop as an Example
Yuhan Helena Liu
Guangyu Robert Yang
Christopher J. Cueva
191
0
0
07 Jun 2025
Alternating Gradient Flows: A Theory of Feature Learning in Two-layer Neural Networks
Alternating Gradient Flows: A Theory of Feature Learning in Two-layer Neural Networks
D. Kunin
Giovanni Luca Marchetti
F. Chen
Dhruva Karkada
James B. Simon
M. DeWeese
Surya Ganguli
Nina Miolane
273
3
0
06 Jun 2025
Contextually Guided Transformers via Low-Rank Adaptation
Contextually Guided Transformers via Low-Rank Adaptation
A. Zhmoginov
Jihwan Lee
Max Vladymyrov
Mark Sandler
OffRL
158
0
0
06 Jun 2025
MesaNet: Sequence Modeling by Locally Optimal Test-Time Training
J. Oswald
Nino Scherrer
Seijin Kobayashi
Luca Versari
Songlin Yang
...
Guillaume Lajoie
Charlotte Frenkel
Razvan Pascanu
Blaise Agüera y Arcas
João Sacramento
225
12
0
05 Jun 2025
Counterfactual reasoning: an analysis of in-context emergence
Counterfactual reasoning: an analysis of in-context emergence
Moritz Miller
Bernhard Schölkopf
Siyuan Guo
ReLMLRM
311
0
0
05 Jun 2025
Sample Complexity and Representation Ability of Test-time Scaling Paradigms
Sample Complexity and Representation Ability of Test-time Scaling Paradigms
Baihe Huang
Shanda Li
Tianhao Wu
Yiming Yang
Ameet Talwalkar
Kannan Ramchandran
Michael I. Jordan
Jiantao Jiao
LRM
299
1
0
05 Jun 2025
When can in-context learning generalize out of task distribution?
When can in-context learning generalize out of task distribution?
Chase Goddard
Lindsay M. Smith
Vudtiwat Ngampruetikorn
David J. Schwab
OOD
102
3
0
05 Jun 2025
A Generative Adaptive Replay Continual Learning Model for Temporal Knowledge Graph Reasoning
A Generative Adaptive Replay Continual Learning Model for Temporal Knowledge Graph ReasoningAnnual Meeting of the Association for Computational Linguistics (ACL), 2025
Zhiyu Zhang
Wei Chen
Youfang Lin
Huaiyu Wan
OffRLCLL
331
1
0
04 Jun 2025
Relational reasoning and inductive bias in transformers trained on a transitive inference task
J. Geerts
Stephanie Chan
Claudia Clopath
Kimberly L. Stachenfeld
LRM
146
2
0
04 Jun 2025
Transformers as Multi-task Learners: Decoupling Features in Hidden Markov Models
Transformers as Multi-task Learners: Decoupling Features in Hidden Markov Models
Yifan Hao
Chenlu Ye
Chi Han
Tong Zhang
169
0
0
02 Jun 2025
The Unified Cognitive Consciousness Theory for Language Models: Anchoring Semantics, Thresholds of Activation, and Emergent Reasoning
The Unified Cognitive Consciousness Theory for Language Models: Anchoring Semantics, Thresholds of Activation, and Emergent Reasoning
Edward Y. Chang
Zeyneb N. Kaya
Ethan Chang
LRM
257
0
0
02 Jun 2025
Weight-Space Linear Recurrent Neural Networks
Weight-Space Linear Recurrent Neural Networks
Roussel Desmond Nzoyem
Nawid Keshtmand
Enrique Crespo Fernandez
Idriss Tsayem
Raúl Santos-Rodríguez
David A.W. Barton
Tom Deakin
256
0
0
01 Jun 2025
From Parameters to Prompts: Understanding and Mitigating the Factuality Gap between Fine-Tuned LLMs
From Parameters to Prompts: Understanding and Mitigating the Factuality Gap between Fine-Tuned LLMs
Xuan Gong
Hanbo Huang
Shiyu Liang
177
0
0
29 May 2025
Neither Stochastic Parroting nor AGI: LLMs Solve Tasks through Context-Directed Extrapolation from Training Data Priors
Neither Stochastic Parroting nor AGI: LLMs Solve Tasks through Context-Directed Extrapolation from Training Data Priors
Harish Tayyar Madabushi
Melissa Torgbi
C. Bonial
283
3
0
29 May 2025
The Role of Diversity in In-Context Learning for Large Language Models
The Role of Diversity in In-Context Learning for Large Language Models
Wenyang Xiao
Haoyu Zhao
Lingxiao Huang
301
1
0
26 May 2025
Optimization-Inspired Few-Shot Adaptation for Large Language Models
Optimization-Inspired Few-Shot Adaptation for Large Language Models
Boyan Gao
Xin Wang
Jianlong Wu
David A. Clifton
216
0
0
25 May 2025
Multi-Scale Manifold Alignment for Interpreting Large Language Models: A Unified Information-Geometric Framework
Multi-Scale Manifold Alignment for Interpreting Large Language Models: A Unified Information-Geometric Framework
Yukun Zhang
Qi Dong
101
0
0
24 May 2025
Understanding Prompt Tuning and In-Context Learning via Meta-Learning
Understanding Prompt Tuning and In-Context Learning via Meta-Learning
Tim Genewein
Kevin Wenliang Li
Jordi Grau-Moya
Anian Ruoss
Laurent Orseau
Marcus Hutter
VPVLM
282
2
0
22 May 2025
Only Large Weights (And Not Skip Connections) Can Prevent the Perils of Rank Collapse
Only Large Weights (And Not Skip Connections) Can Prevent the Perils of Rank Collapse
Josh Alman
Zhao Song
287
9
0
22 May 2025
From Compression to Expression: A Layerwise Analysis of In-Context Learning
From Compression to Expression: A Layerwise Analysis of In-Context Learning
Jiachen Jiang
Yuxin Dong
Jinxin Zhou
Zhihui Zhu
129
2
0
22 May 2025
Beyond Induction Heads: In-Context Meta Learning Induces Multi-Phase Circuit Emergence
Beyond Induction Heads: In-Context Meta Learning Induces Multi-Phase Circuit Emergence
Gouki Minegishi
Hiroki Furuta
Shohei Taniguchi
Yusuke Iwasawa
Yutaka Matsuo
300
6
0
22 May 2025
Joint Flashback Adaptation for Forgetting-Resistant Instruction Tuning
Joint Flashback Adaptation for Forgetting-Resistant Instruction Tuning
Yukun Zhao
Lingyong Yan
Zhenyang Li
Shuaiqiang Wang
Zhumin Chen
Zhaochun Ren
Dawei Yin
CLLKELMVLMLRM
198
0
0
21 May 2025
How Transformers Learn In-Context Recall Tasks? Optimality, Training Dynamics and Generalization
How Transformers Learn In-Context Recall Tasks? Optimality, Training Dynamics and Generalization
Quan Nguyen
Thanh Nguyen-Tang
MLT
286
1
0
21 May 2025
Previous
12345...8910
Next