Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2306.00297
Cited By
Transformers learn to implement preconditioned gradient descent for in-context learning
1 June 2023
Kwangjun Ahn
Xiang Cheng
Hadi Daneshmand
S. Sra
ODL
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Transformers learn to implement preconditioned gradient descent for in-context learning"
50 / 121 papers shown
Title
Rethinking Invariance in In-context Learning
Lizhe Fang
Yifei Wang
Khashayar Gatmiry
Lei Fang
Yishuo Wang
54
2
0
08 May 2025
How Transformers Learn Regular Language Recognition: A Theoretical Study on Training Dynamics and Implicit Bias
Ruiquan Huang
Yingbin Liang
Jing Yang
46
0
0
02 May 2025
Exact Learning Dynamics of In-Context Learning in Linear Transformers and Its Application to Non-Linear Transformers
Nischal Mainali
Lucas Teixeira
24
0
0
17 Apr 2025
Why and How LLMs Hallucinate: Connecting the Dots with Subsequence Associations
Yiyou Sun
Y. Gai
Lijie Chen
Abhilasha Ravichander
Yejin Choi
D. Song
HILM
57
0
0
17 Apr 2025
Can Pre-training Indicators Reliably Predict Fine-tuning Outcomes of LLMs?
Hansi Zeng
Kai Hui
Honglei Zhuang
Zhen Qin
Zhenrui Yue
Hamed Zamani
Dana Alon
35
0
0
16 Apr 2025
Reasoning without Regret
Tarun Chitra
OffRL
LRM
32
0
0
14 Apr 2025
Gating is Weighting: Understanding Gated Linear Attention through In-context Learning
Yingcong Li
Davoud Ataee Tarzanagh
A. S. Rawat
Maryam Fazel
Samet Oymak
25
0
0
06 Apr 2025
Contextualize-then-Aggregate: Circuits for In-Context Learning in Gemma-2 2B
Aleksandra Bakalova
Yana Veitsman
Xinting Huang
Michael Hahn
33
0
0
31 Mar 2025
Experience Replay Addresses Loss of Plasticity in Continual Learning
Jiuqi Wang
Rohan Chandra
Shangtong Zhang
CLL
KELM
49
0
0
25 Mar 2025
Transformer-based Wireless Symbol Detection Over Fading Channels
Li Fan
Jing Yang
Cong Shen
36
0
0
20 Mar 2025
Test-Time Training Provably Improves Transformers as In-context Learners
Halil Alperen Gozeten
M. E. Ildiz
Xuechen Zhang
Mahdi Soltanolkotabi
Marco Mondelli
Samet Oymak
46
1
0
14 Mar 2025
When Do Transformers Outperform Feedforward and Recurrent Networks? A Statistical Perspective
Alireza Mousavi-Hosseini
Clayton Sanford
Denny Wu
Murat A. Erdogdu
48
0
0
14 Mar 2025
Taming Knowledge Conflicts in Language Models
Gaotang Li
Yuzhong Chen
Hanghang Tong
KELM
49
1
0
14 Mar 2025
Provable Benefits of Task-Specific Prompts for In-context Learning
Xiangyu Chang
Yingcong Li
Muti Kara
Samet Oymak
A. Roy-Chowdhury
60
0
0
03 Mar 2025
In-Context Learning with Hypothesis-Class Guidance
Ziqian Lin
Shubham Kumar Bharti
Kangwook Lee
76
0
0
27 Feb 2025
Towards Auto-Regressive Next-Token Prediction: In-Context Learning Emerges from Generalization
Zixuan Gong
Xiaolin Hu
Huayi Tang
Yong Liu
35
0
0
24 Feb 2025
On the Robustness of Transformers against Context Hijacking for Linear Classification
Tianle Li
Chenyang Zhang
Xingwu Chen
Yuan Cao
Difan Zou
72
0
0
24 Feb 2025
Ask, and it shall be given: On the Turing completeness of prompting
Ruizhong Qiu
Zhe Xu
W. Bao
Hanghang Tong
ReLM
LRM
AI4CE
70
0
0
24 Feb 2025
CoT-ICL Lab: A Petri Dish for Studying Chain-of-Thought Learning from In-Context Demonstrations
Vignesh Kothapalli
Hamed Firooz
Maziar Sanjabi
68
0
0
21 Feb 2025
Vector-ICL: In-context Learning with Continuous Vector Representations
Yufan Zhuang
Chandan Singh
Liyuan Liu
Jingbo Shang
Jianfeng Gao
54
3
0
21 Feb 2025
Looped ReLU MLPs May Be All You Need as Practical Programmable Computers
Yingyu Liang
Zhizhou Sha
Zhenmei Shi
Zhao-quan Song
Yufa Zhou
96
18
0
21 Feb 2025
Transformers versus the EM Algorithm in Multi-class Clustering
Yihan He
Hong-Yu Chen
Yuan Cao
Jianqing Fan
Han Liu
55
0
0
09 Feb 2025
Training Dynamics of In-Context Learning in Linear Attention
Yedi Zhang
Aaditya K. Singh
Peter E. Latham
Andrew Saxe
MLT
64
1
0
28 Jan 2025
Are Transformers Able to Reason by Connecting Separated Knowledge in Training Data?
Yutong Yin
Zhaoran Wang
LRM
ReLM
143
0
0
27 Jan 2025
Understanding Knowledge Hijack Mechanism in In-context Learning through Associative Memory
Shuo Wang
Issei Sato
76
0
0
16 Dec 2024
4Real-Video: Learning Generalizable Photo-Realistic 4D Video Diffusion
Chaoyang Wang
Peiye Zhuang
Tuan Duc Ngo
Willi Menapace
Aliaksandr Siarohin
Michael Vasilkovsky
Ivan Skorokhodov
Sergey Tulyakov
Peter Wonka
Hsin-Ying Lee
DiffM
VGen
90
3
0
05 Dec 2024
Re-examining learning linear functions in context
Omar Naim
Guilhem Fouilhé
Nicholas Asher
63
0
0
18 Nov 2024
One-Layer Transformer Provably Learns One-Nearest Neighbor In Context
Zihao Li
Yuan Cao
Cheng Gao
Yihan He
Han Liu
Jason M. Klusowski
Jianqing Fan
Mengdi Wang
MLT
55
6
0
16 Nov 2024
Pretrained transformer efficiently learns low-dimensional target functions in-context
Kazusato Oko
Yujin Song
Taiji Suzuki
Denny Wu
41
4
0
04 Nov 2024
Abrupt Learning in Transformers: A Case Study on Matrix Completion
Pulkit Gopalani
Ekdeep Singh Lubana
Wei Hu
42
3
0
29 Oct 2024
On the Role of Depth and Looping for In-Context Learning with Task Diversity
Khashayar Gatmiry
Nikunj Saunshi
Sashank J. Reddi
Stefanie Jegelka
Sanjiv Kumar
29
2
0
29 Oct 2024
Provable optimal transport with transformers: The essence of depth and prompt engineering
Hadi Daneshmand
OT
37
0
0
25 Oct 2024
A Theoretical Understanding of Chain-of-Thought: Coherent Reasoning and Error-Aware Demonstration
Yingqian Cui
Pengfei He
Xianfeng Tang
Qi He
Chen Luo
Jiliang Tang
Yue Xing
LRM
31
6
0
21 Oct 2024
Bayesian scaling laws for in-context learning
Aryaman Arora
Dan Jurafsky
Christopher Potts
Noah D. Goodman
24
2
0
21 Oct 2024
In-context learning and Occam's razor
Eric Elmoznino
Tom Marty
Tejas Kasetty
Léo Gagnon
Sarthak Mittal
Mahan Fathi
Dhanya Sridhar
Guillaume Lajoie
42
1
0
17 Oct 2024
Active-Dormant Attention Heads: Mechanistically Demystifying Extreme-Token Phenomena in LLMs
Tianyu Guo
Druv Pai
Yu Bai
Jiantao Jiao
Michael I. Jordan
Song Mei
29
10
0
17 Oct 2024
On the Learn-to-Optimize Capabilities of Transformers in In-Context Sparse Recovery
Renpu Liu
Ruida Zhou
Cong Shen
Jing Yang
28
0
0
17 Oct 2024
Context-Scaling versus Task-Scaling in In-Context Learning
Amirhesam Abedsoltan
Adityanarayanan Radhakrishnan
Jingfeng Wu
M. Belkin
ReLM
LRM
40
3
0
16 Oct 2024
On the Training Convergence of Transformers for In-Context Classification
Wei Shen
Ruida Zhou
Jing Yang
Cong Shen
26
3
0
15 Oct 2024
A Theoretical Survey on Foundation Models
Shi Fu
Yuzhu Chen
Yingjie Wang
Dacheng Tao
28
0
0
15 Oct 2024
Bypassing the Exponential Dependency: Looped Transformers Efficiently Learn In-context by Multi-step Gradient Descent
Bo Chen
Xiaoyu Li
Yingyu Liang
Zhenmei Shi
Zhao-quan Song
96
19
0
15 Oct 2024
Can In-context Learning Really Generalize to Out-of-distribution Tasks?
Qixun Wang
Yifei Wang
Yisen Wang
Xianghua Ying
OOD
33
2
0
13 Oct 2024
Can Looped Transformers Learn to Implement Multi-step Gradient Descent for In-context Learning?
Khashayar Gatmiry
Nikunj Saunshi
Sashank J. Reddi
Stefanie Jegelka
Sanjiv Kumar
67
17
0
10 Oct 2024
Everything Everywhere All at Once: LLMs can In-Context Learn Multiple Tasks in Superposition
Zheyang Xiong
Ziyang Cai
John Cooper
Albert Ge
Vasilis Papageorgiou
...
Saurabh Agarwal
Grigorios G Chrysos
Samet Oymak
Kangwook Lee
Dimitris Papailiopoulos
LRM
35
1
0
08 Oct 2024
Transformers learn variable-order Markov chains in-context
Ruida Zhou
C. Tian
Suhas Diggavi
26
0
0
07 Oct 2024
Task Diversity Shortens the ICL Plateau
Jaeyeon Kim
Sehyun Kwon
Joo Young Choi
Jongho Park
Jaewoong Cho
Jason D. Lee
Ernest K. Ryu
MoMe
31
2
0
07 Oct 2024
Training Nonlinear Transformers for Chain-of-Thought Inference: A Theoretical Generalization Analysis
Hongkang Li
Meng Wang
Songtao Lu
Xiaodong Cui
Pin-Yu Chen
LRM
32
5
0
03 Oct 2024
Theoretical Insights into Fine-Tuning Attention Mechanism: Generalization and Optimization
Xinhao Yao
Hongjin Qian
Xiaolin Hu
Gengze Xu
Wei Liu
Jian Luan
Bin Wang
Yong-Jin Liu
48
0
0
03 Oct 2024
Towards Understanding the Universality of Transformers for Next-Token Prediction
Michael E. Sander
Gabriel Peyré
CML
39
0
0
03 Oct 2024
Trained Transformer Classifiers Generalize and Exhibit Benign Overfitting In-Context
Spencer Frei
Gal Vardi
MLT
28
3
0
02 Oct 2024
1
2
3
Next