ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2306.00297
  4. Cited By
Transformers learn to implement preconditioned gradient descent for
  in-context learning

Transformers learn to implement preconditioned gradient descent for in-context learning

1 June 2023
Kwangjun Ahn
Xiang Cheng
Hadi Daneshmand
S. Sra
    ODL
ArXivPDFHTML

Papers citing "Transformers learn to implement preconditioned gradient descent for in-context learning"

50 / 121 papers shown
Title
Sparse Autoencoders Reveal Temporal Difference Learning in Large
  Language Models
Sparse Autoencoders Reveal Temporal Difference Learning in Large Language Models
Can Demircan
Tankred Saanum
A. Jagadish
Marcel Binz
Eric Schulz
35
1
0
02 Oct 2024
Attention layers provably solve single-location regression
Attention layers provably solve single-location regression
P. Marion
Raphael Berthier
Gérard Biau
Claire Boyer
140
2
0
02 Oct 2024
Transformers Handle Endogeneity in In-Context Linear Regression
Transformers Handle Endogeneity in In-Context Linear Regression
Haodong Liang
Krishnakumar Balasubramanian
Lifeng Lai
38
1
0
02 Oct 2024
Non-asymptotic Convergence of Training Transformers for Next-token
  Prediction
Non-asymptotic Convergence of Training Transformers for Next-token Prediction
Ruiquan Huang
Yingbin Liang
Jing Yang
29
5
0
25 Sep 2024
Provable In-Context Learning of Linear Systems and Linear Elliptic PDEs
  with Transformers
Provable In-Context Learning of Linear Systems and Linear Elliptic PDEs with Transformers
Frank Cole
Yulong Lu
Riley OÑeill
Tianhao Zhang
45
2
0
18 Sep 2024
Transformers are Minimax Optimal Nonparametric In-Context Learners
Transformers are Minimax Optimal Nonparametric In-Context Learners
Juno Kim
Tai Nakamaki
Taiji Suzuki
28
9
0
22 Aug 2024
How Transformers Utilize Multi-Head Attention in In-Context Learning? A
  Case Study on Sparse Linear Regression
How Transformers Utilize Multi-Head Attention in In-Context Learning? A Case Study on Sparse Linear Regression
Xingwu Chen
Lei Zhao
Difan Zou
46
6
0
08 Aug 2024
Transformers are Universal In-context Learners
Transformers are Universal In-context Learners
Takashi Furuya
Maarten V. de Hoop
Gabriel Peyré
42
6
0
02 Aug 2024
Representing Rule-based Chatbots with Transformers
Representing Rule-based Chatbots with Transformers
Dan Friedman
Abhishek Panigrahi
Danqi Chen
66
1
0
15 Jul 2024
Fine-grained Analysis of In-context Linear Estimation: Data,
  Architecture, and Beyond
Fine-grained Analysis of In-context Linear Estimation: Data, Architecture, and Beyond
Yingcong Li
A. S. Rawat
Samet Oymak
25
6
0
13 Jul 2024
HiPPO-Prophecy: State-Space Models can Provably Learn Dynamical Systems
  in Context
HiPPO-Prophecy: State-Space Models can Provably Learn Dynamical Systems in Context
Federico Arangath Joseph
K. Haefeli
Noah Liniger
Çağlar Gülçehre
23
2
0
12 Jul 2024
Distributed Rule Vectors is A Key Mechanism in Large Language Models'
  In-Context Learning
Distributed Rule Vectors is A Key Mechanism in Large Language Models' In-Context Learning
Bowen Zheng
Ming Ma
Zhongqiao Lin
Tianming Yang
33
1
0
23 Jun 2024
Probing the Decision Boundaries of In-context Learning in Large Language
  Models
Probing the Decision Boundaries of In-context Learning in Large Language Models
Siyan Zhao
Tung Nguyen
Aditya Grover
41
5
0
17 Jun 2024
Enhancing In-Context Learning Performance with just SVD-Based Weight
  Pruning: A Theoretical Perspective
Enhancing In-Context Learning Performance with just SVD-Based Weight Pruning: A Theoretical Perspective
Xinhao Yao
Xiaolin Hu
Shenzhi Yang
Yong Liu
47
2
0
06 Jun 2024
Universal In-Context Approximation By Prompting Fully Recurrent Models
Universal In-Context Approximation By Prompting Fully Recurrent Models
Aleksandar Petrov
Tom A. Lamb
Alasdair Paren
Philip H. S. Torr
Adel Bibi
LRM
32
0
0
03 Jun 2024
Why Larger Language Models Do In-context Learning Differently?
Why Larger Language Models Do In-context Learning Differently?
Zhenmei Shi
Junyi Wei
Zhuoyan Xu
Yingyu Liang
37
18
0
30 May 2024
A Theoretical Understanding of Self-Correction through In-context
  Alignment
A Theoretical Understanding of Self-Correction through In-context Alignment
Yifei Wang
Yuyang Wu
Zeming Wei
Stefanie Jegelka
Yisen Wang
LRM
41
13
0
28 May 2024
IM-Context: In-Context Learning for Imbalanced Regression Tasks
IM-Context: In-Context Learning for Imbalanced Regression Tasks
Ismail Nejjar
Faez Ahmed
Olga Fink
32
1
0
28 May 2024
On Mesa-Optimization in Autoregressively Trained Transformers: Emergence
  and Capability
On Mesa-Optimization in Autoregressively Trained Transformers: Emergence and Capability
Chenyu Zheng
Wei Huang
Rongzheng Wang
Guoqiang Wu
Jun Zhu
Chongxuan Li
39
1
0
27 May 2024
Automatic Domain Adaptation by Transformers in In-Context Learning
Automatic Domain Adaptation by Transformers in In-Context Learning
Ryuichiro Hataya
Kota Matsui
Masaaki Imaizumi
32
1
0
27 May 2024
On Understanding Attention-Based In-Context Learning for Categorical Data
On Understanding Attention-Based In-Context Learning for Categorical Data
Aaron T. Wang
William Convertino
Xiang Cheng
Ricardo Henao
Lawrence Carin
61
0
0
27 May 2024
Towards Better Understanding of In-Context Learning Ability from
  In-Context Uncertainty Quantification
Towards Better Understanding of In-Context Learning Ability from In-Context Uncertainty Quantification
Shang Liu
Zhongze Cai
Guanting Chen
Xiaocheng Li
UQCV
46
1
0
24 May 2024
DETAIL: Task DEmonsTration Attribution for Interpretable In-context
  Learning
DETAIL: Task DEmonsTration Attribution for Interpretable In-context Learning
Zijian Zhou
Xiaoqiang Lin
Xinyi Xu
Alok Prakash
Daniela Rus
K. H. Low
36
2
0
22 May 2024
Asymptotic theory of in-context learning by linear attention
Asymptotic theory of in-context learning by linear attention
Yue M. Lu
Mary I. Letey
Jacob A. Zavatone-Veth
Anindita Maiti
C. Pehlevan
29
10
0
20 May 2024
From Words to Numbers: Your Large Language Model Is Secretly A Capable
  Regressor When Given In-Context Examples
From Words to Numbers: Your Large Language Model Is Secretly A Capable Regressor When Given In-Context Examples
Robert Vacareanu
Vlad-Andrei Negru
Vasile Suciu
Mihai Surdeanu
31
28
0
11 Apr 2024
Can large language models explore in-context?
Can large language models explore in-context?
Akshay Krishnamurthy
Keegan Harris
Dylan J. Foster
Cyril Zhang
Aleksandrs Slivkins
LM&Ro
LLMAG
LRM
120
23
0
22 Mar 2024
Transfer Learning Beyond Bounded Density Ratios
Transfer Learning Beyond Bounded Density Ratios
Alkis Kalavasis
Ilias Zadik
Manolis Zampetakis
47
4
0
18 Mar 2024
How Well Can Transformers Emulate In-context Newton's Method?
How Well Can Transformers Emulate In-context Newton's Method?
Angeliki Giannou
Liu Yang
Tianhao Wang
Dimitris Papailiopoulos
Jason D. Lee
38
16
0
05 Mar 2024
How Do Nonlinear Transformers Learn and Generalize in In-Context
  Learning?
How Do Nonlinear Transformers Learn and Generalize in In-Context Learning?
Hongkang Li
Meng Wang
Songtao Lu
Xiaodong Cui
Pin-Yu Chen
MLT
42
14
0
23 Feb 2024
In-Context Learning of a Linear Transformer Block: Benefits of the MLP
  Component and One-Step GD Initialization
In-Context Learning of a Linear Transformer Block: Benefits of the MLP Component and One-Step GD Initialization
Ruiqi Zhang
Jingfeng Wu
Peter L. Bartlett
36
12
0
22 Feb 2024
Linear Transformers are Versatile In-Context Learners
Linear Transformers are Versatile In-Context Learners
Max Vladymyrov
J. Oswald
Mark Sandler
Rong Ge
34
13
0
21 Feb 2024
How do Transformers perform In-Context Autoregressive Learning?
How do Transformers perform In-Context Autoregressive Learning?
Michael E. Sander
Raja Giryes
Taiji Suzuki
Mathieu Blondel
Gabriel Peyré
32
7
0
08 Feb 2024
Can Mamba Learn How to Learn? A Comparative Study on In-Context Learning
  Tasks
Can Mamba Learn How to Learn? A Comparative Study on In-Context Learning Tasks
Jongho Park
Jaeseung Park
Zheyang Xiong
Nayoung Lee
Jaewoong Cho
Samet Oymak
Kangwook Lee
Dimitris Papailiopoulos
24
69
0
06 Feb 2024
Is Mamba Capable of In-Context Learning?
Is Mamba Capable of In-Context Learning?
Riccardo Grazzi
Julien N. Siems
Simon Schrodi
Thomas Brox
Frank Hutter
29
40
0
05 Feb 2024
Can MLLMs Perform Text-to-Image In-Context Learning?
Can MLLMs Perform Text-to-Image In-Context Learning?
Yuchen Zeng
Wonjun Kang
Yicong Chen
Hyung Il Koo
Kangwook Lee
MLLM
36
9
0
02 Feb 2024
Transformers Learn Nonlinear Features In Context: Nonconvex Mean-field
  Dynamics on the Attention Landscape
Transformers Learn Nonlinear Features In Context: Nonconvex Mean-field Dynamics on the Attention Landscape
Juno Kim
Taiji Suzuki
18
18
0
02 Feb 2024
Theoretical Understanding of In-Context Learning in Shallow Transformers
  with Unstructured Data
Theoretical Understanding of In-Context Learning in Shallow Transformers with Unstructured Data
Yue Xing
Xiaofeng Lin
Chenheng Xu
Namjoon Suh
Qifan Song
Guang Cheng
19
3
0
01 Feb 2024
Superiority of Multi-Head Attention in In-Context Linear Regression
Superiority of Multi-Head Attention in In-Context Linear Regression
Yingqian Cui
Jie Ren
Pengfei He
Jiliang Tang
Yue Xing
37
12
0
30 Jan 2024
An Information-Theoretic Analysis of In-Context Learning
An Information-Theoretic Analysis of In-Context Learning
Hong Jun Jeon
Jason D. Lee
Qi Lei
Benjamin Van Roy
27
18
0
28 Jan 2024
Anchor function: a type of benchmark functions for studying language
  models
Anchor function: a type of benchmark functions for studying language models
Zhongwang Zhang
Zhiwei Wang
Junjie Yao
Zhangchen Zhou
Xiaolong Li
E. Weinan
Z. Xu
40
5
0
16 Jan 2024
Setting the Record Straight on Transformer Oversmoothing
Setting the Record Straight on Transformer Oversmoothing
G. Dovonon
M. Bronstein
Matt J. Kusner
28
5
0
09 Jan 2024
Transformers Implement Functional Gradient Descent to Learn Non-Linear
  Functions In Context
Transformers Implement Functional Gradient Descent to Learn Non-Linear Functions In Context
Xiang Cheng
Yuxin Chen
S. Sra
18
35
0
11 Dec 2023
The mechanistic basis of data dependence and abrupt learning in an
  in-context classification task
The mechanistic basis of data dependence and abrupt learning in an in-context classification task
Gautam Reddy
24
50
0
03 Dec 2023
Compositional Capabilities of Autoregressive Transformers: A Study on
  Synthetic, Interpretable Tasks
Compositional Capabilities of Autoregressive Transformers: A Study on Synthetic, Interpretable Tasks
Rahul Ramesh
Ekdeep Singh Lubana
Mikail Khona
Robert P. Dick
Hidenori Tanaka
CoGe
36
6
0
21 Nov 2023
Looped Transformers are Better at Learning Learning Algorithms
Looped Transformers are Better at Learning Learning Algorithms
Liu Yang
Kangwook Lee
Robert D. Nowak
Dimitris Papailiopoulos
24
24
0
21 Nov 2023
In-context Learning and Gradient Descent Revisited
In-context Learning and Gradient Descent Revisited
Gilad Deutch
Nadav Magar
Tomer Bar Natan
Guy Dar
28
8
0
13 Nov 2023
Transformers are Provably Optimal In-context Estimators for Wireless Communications
Transformers are Provably Optimal In-context Estimators for Wireless Communications
Vishnu Teja Kunde
Vicram Rajagopalan
Chandra Shekhara Kaushik Valmeekam
Krishna R. Narayanan
S. Shakkottai
D. Kalathil
J. Chamberland
35
4
0
01 Nov 2023
In-Context Learning Dynamics with Random Binary Sequences
In-Context Learning Dynamics with Random Binary Sequences
Eric J. Bigelow
Ekdeep Singh Lubana
Robert P. Dick
Hidenori Tanaka
T. Ullman
29
4
0
26 Oct 2023
The Expressive Power of Low-Rank Adaptation
The Expressive Power of Low-Rank Adaptation
Yuchen Zeng
Kangwook Lee
33
51
0
26 Oct 2023
Learning to (Learn at Test Time)
Learning to (Learn at Test Time)
Yu Sun
Xinhao Li
Karan Dalal
Chloe Hsu
Oluwasanmi Koyejo
Carlos Guestrin
Xiaolong Wang
Tatsunori Hashimoto
Xinlei Chen
SSL
30
6
0
20 Oct 2023
Previous
123
Next