ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2212.07677
  4. Cited By
Transformers learn in-context by gradient descent

Transformers learn in-context by gradient descent

15 December 2022
J. Oswald
Eyvind Niklasson
E. Randazzo
João Sacramento
A. Mordvintsev
A. Zhmoginov
Max Vladymyrov
    MLT
ArXivPDFHTML

Papers citing "Transformers learn in-context by gradient descent"

50 / 69 papers shown
Title
On the generalization of language models from in-context learning and finetuning: a controlled study
On the generalization of language models from in-context learning and finetuning: a controlled study
Andrew Kyle Lampinen
Arslan Chaudhry
Stephanie Chan
Cody Wild
Diane Wan
Alex Ku
Jorg Bornschein
Razvan Pascanu
Murray Shanahan
James L. McClelland
46
0
0
01 May 2025
Self-Generated In-Context Examples Improve LLM Agents for Sequential Decision-Making Tasks
Self-Generated In-Context Examples Improve LLM Agents for Sequential Decision-Making Tasks
Vishnu Sarukkai
Zhiqiang Xie
Kayvon Fatahalian
LLMAG
68
0
0
01 May 2025
When is Task Vector Provably Effective for Model Editing? A Generalization Analysis of Nonlinear Transformers
When is Task Vector Provably Effective for Model Editing? A Generalization Analysis of Nonlinear Transformers
Hongkang Li
Yihua Zhang
Shuai Zhang
M. Wang
Sijia Liu
Pin-Yu Chen
MoMe
57
2
0
15 Apr 2025
An extension of linear self-attention for in-context learning
An extension of linear self-attention for in-context learning
Katsuyuki Hagiwara
34
0
0
31 Mar 2025
L0-Reasoning Bench: Evaluating Procedural Correctness in Language Models via Simple Program Execution
L0-Reasoning Bench: Evaluating Procedural Correctness in Language Models via Simple Program Execution
Simeng Sun
Cheng-Ping Hsieh
Faisal Ladhak
Erik Arakelyan
Santiago Akle Serano
Boris Ginsburg
ReLM
ELM
LRM
56
0
0
28 Mar 2025
Neural ODE Transformers: Analyzing Internal Dynamics and Adaptive Fine-tuning
Neural ODE Transformers: Analyzing Internal Dynamics and Adaptive Fine-tuning
Anh Tong
Thanh Nguyen-Tang
Dongeun Lee
Duc Nguyen
Toan M. Tran
David Hall
Cheongwoong Kang
Jaesik Choi
33
0
0
03 Mar 2025
In-Context Learning with Hypothesis-Class Guidance
In-Context Learning with Hypothesis-Class Guidance
Ziqian Lin
Shubham Kumar Bharti
Kangwook Lee
64
0
0
27 Feb 2025
CoT-ICL Lab: A Petri Dish for Studying Chain-of-Thought Learning from In-Context Demonstrations
CoT-ICL Lab: A Petri Dish for Studying Chain-of-Thought Learning from In-Context Demonstrations
Vignesh Kothapalli
Hamed Firooz
Maziar Sanjabi
55
0
0
21 Feb 2025
Looped ReLU MLPs May Be All You Need as Practical Programmable Computers
Looped ReLU MLPs May Be All You Need as Practical Programmable Computers
Yingyu Liang
Zhizhou Sha
Zhenmei Shi
Zhao-quan Song
Yufa Zhou
89
18
0
21 Feb 2025
Vector-ICL: In-context Learning with Continuous Vector Representations
Vector-ICL: In-context Learning with Continuous Vector Representations
Yufan Zhuang
Chandan Singh
Liyuan Liu
Jingbo Shang
Jianfeng Gao
52
3
0
21 Feb 2025
Zero-shot Model-based Reinforcement Learning using Large Language Models
Zero-shot Model-based Reinforcement Learning using Large Language Models
Abdelhakim Benechehab
Youssef Attia El Hili
Ambroise Odonnat
Oussama Zekri
Albert Thomas
Giuseppe Paolo
Maurizio Filippone
I. Redko
Balázs Kégl
OffRL
57
1
0
17 Feb 2025
Are Transformers Able to Reason by Connecting Separated Knowledge in Training Data?
Are Transformers Able to Reason by Connecting Separated Knowledge in Training Data?
Yutong Yin
Zhaoran Wang
LRM
ReLM
50
0
0
27 Jan 2025
Out-of-distribution generalization via composition: a lens through induction heads in Transformers
Out-of-distribution generalization via composition: a lens through induction heads in Transformers
Jiajun Song
Zhuoyan Xu
Yiqiao Zhong
75
4
0
31 Dec 2024
ICLR: In-Context Learning of Representations
ICLR: In-Context Learning of Representations
Core Francisco Park
Andrew Lee
Ekdeep Singh Lubana
Yongyi Yang
Maya Okawa
Kento Nishi
Martin Wattenberg
Hidenori Tanaka
AIFin
111
3
0
29 Dec 2024
Revisiting In-Context Learning with Long Context Language Models
Revisiting In-Context Learning with Long Context Language Models
Jinheon Baek
Sun Jae Lee
Prakhar Gupta
Geunseob
Oh
Siddharth Dalmia
98
0
0
22 Dec 2024
Toward Understanding In-context vs. In-weight Learning
Toward Understanding In-context vs. In-weight Learning
Bryan Chan
Xinyi Chen
András Gyorgy
Dale Schuurmans
65
3
0
30 Oct 2024
In-context learning and Occam's razor
In-context learning and Occam's razor
Eric Elmoznino
Tom Marty
Tejas Kasetty
Léo Gagnon
Sarthak Mittal
Mahan Fathi
Dhanya Sridhar
Guillaume Lajoie
32
1
0
17 Oct 2024
On the Learn-to-Optimize Capabilities of Transformers in In-Context Sparse Recovery
On the Learn-to-Optimize Capabilities of Transformers in In-Context Sparse Recovery
Renpu Liu
Ruida Zhou
Cong Shen
Jing Yang
23
0
0
17 Oct 2024
Context-Scaling versus Task-Scaling in In-Context Learning
Context-Scaling versus Task-Scaling in In-Context Learning
Amirhesam Abedsoltan
Adityanarayanan Radhakrishnan
Jingfeng Wu
M. Belkin
ReLM
LRM
32
3
0
16 Oct 2024
State-space models can learn in-context by gradient descent
State-space models can learn in-context by gradient descent
Neeraj Mohan Sushma
Yudou Tian
Harshvardhan Mestha
Nicolo Colombo
David Kappel
Anand Subramoney
35
3
0
15 Oct 2024
Bypassing the Exponential Dependency: Looped Transformers Efficiently Learn In-context by Multi-step Gradient Descent
Bypassing the Exponential Dependency: Looped Transformers Efficiently Learn In-context by Multi-step Gradient Descent
Bo Chen
Xiaoyu Li
Yingyu Liang
Zhenmei Shi
Zhao-quan Song
80
19
0
15 Oct 2024
ELICIT: LLM Augmentation via External In-Context Capability
ELICIT: LLM Augmentation via External In-Context Capability
Futing Wang
Jianhao Yan
Yue Zhang
Tao Lin
35
0
0
12 Oct 2024
Zero-Shot Learning of Causal Models
Zero-Shot Learning of Causal Models
Divyat Mahajan
Jannes Gladrow
Agrin Hilmkil
Cheng Zhang
M. Scetbon
28
1
0
08 Oct 2024
Density estimation with LLMs: a geometric investigation of in-context learning trajectories
Density estimation with LLMs: a geometric investigation of in-context learning trajectories
Toni J. B. Liu
Nicolas Boullé
Raphael Sarfati
Christopher Earls
20
0
0
07 Oct 2024
In-context Learning in Presence of Spurious Correlations
In-context Learning in Presence of Spurious Correlations
Hrayr Harutyunyan
R. Darbinyan
Samvel Karapetyan
Hrant Khachatrian
LRM
30
1
0
04 Oct 2024
Towards Understanding the Universality of Transformers for Next-Token Prediction
Towards Understanding the Universality of Transformers for Next-Token Prediction
Michael E. Sander
Gabriel Peyré
CML
29
0
0
03 Oct 2024
Mitigating Copy Bias in In-Context Learning through Neuron Pruning
Mitigating Copy Bias in In-Context Learning through Neuron Pruning
Ameen Ali
Lior Wolf
Ivan Titov
27
2
0
02 Oct 2024
Racing Thoughts: Explaining Contextualization Errors in Large Language Models
Racing Thoughts: Explaining Contextualization Errors in Large Language Models
Michael A. Lepori
Michael Mozer
Asma Ghandeharioun
LRM
80
1
0
02 Oct 2024
Transformers Handle Endogeneity in In-Context Linear Regression
Transformers Handle Endogeneity in In-Context Linear Regression
Haodong Liang
Krishnakumar Balasubramanian
Lifeng Lai
32
1
0
02 Oct 2024
Attention layers provably solve single-location regression
Attention layers provably solve single-location regression
P. Marion
Raphael Berthier
Gérard Biau
Claire Boyer
51
2
0
02 Oct 2024
On Expressive Power of Looped Transformers: Theoretical Analysis and Enhancement via Timestep Encoding
On Expressive Power of Looped Transformers: Theoretical Analysis and Enhancement via Timestep Encoding
Kevin Xu
Issei Sato
37
3
0
02 Oct 2024
Spin glass model of in-context learning
Spin glass model of in-context learning
Yuhao Li
Ruoran Bai
Haiping Huang
LRM
37
0
0
05 Aug 2024
Representing Rule-based Chatbots with Transformers
Representing Rule-based Chatbots with Transformers
Dan Friedman
Abhishek Panigrahi
Danqi Chen
56
1
0
15 Jul 2024
Towards Understanding Multi-Task Learning (Generalization) of LLMs via Detecting and Exploring Task-Specific Neurons
Towards Understanding Multi-Task Learning (Generalization) of LLMs via Detecting and Exploring Task-Specific Neurons
Yongqi Leng
Deyi Xiong
32
5
0
09 Jul 2024
Expressivity of Neural Networks with Random Weights and Learned Biases
Expressivity of Neural Networks with Random Weights and Learned Biases
Ezekiel Williams
Avery Hee-Woon Ryoo
Thomas Jiralerspong
Alexandre Payeur
M. Perich
Luca Mazzucato
Guillaume Lajoie
24
1
0
01 Jul 2024
On the Transformations across Reward Model, Parameter Update, and
  In-Context Prompt
On the Transformations across Reward Model, Parameter Update, and In-Context Prompt
Deng Cai
Huayang Li
Tingchen Fu
Siheng Li
Weiwen Xu
...
Leyang Cui
Yan Wang
Lemao Liu
Taro Watanabe
Shuming Shi
KELM
26
2
0
24 Jun 2024
Distributed Rule Vectors is A Key Mechanism in Large Language Models'
  In-Context Learning
Distributed Rule Vectors is A Key Mechanism in Large Language Models' In-Context Learning
Bowen Zheng
Ming Ma
Zhongqiao Lin
Tianming Yang
21
1
0
23 Jun 2024
On the Representational Capacity of Neural Language Models with Chain-of-Thought Reasoning
On the Representational Capacity of Neural Language Models with Chain-of-Thought Reasoning
Franz Nowak
Anej Svete
Alexandra Butoi
Ryan Cotterell
ReLM
LRM
44
12
0
20 Jun 2024
Towards Better Understanding of In-Context Learning Ability from
  In-Context Uncertainty Quantification
Towards Better Understanding of In-Context Learning Ability from In-Context Uncertainty Quantification
Shang Liu
Zhongze Cai
Guanting Chen
Xiaocheng Li
UQCV
38
1
0
24 May 2024
Implicit In-context Learning
Implicit In-context Learning
Zhuowei Li
Zihao Xu
Ligong Han
Yunhe Gao
Song Wen
Di Liu
Hao Wang
Dimitris N. Metaxas
38
1
0
23 May 2024
Asymptotic theory of in-context learning by linear attention
Asymptotic theory of in-context learning by linear attention
Yue M. Lu
Mary I. Letey
Jacob A. Zavatone-Veth
Anindita Maiti
C. Pehlevan
19
10
0
20 May 2024
In-Context Learning with Long-Context Models: An In-Depth Exploration
In-Context Learning with Long-Context Models: An In-Depth Exploration
Amanda Bertsch
Maor Ivgi
Uri Alon
Jonathan Berant
Matthew R. Gormley
Matthew R. Gormley
Graham Neubig
ReLM
AIMat
81
65
0
30 Apr 2024
LLM Task Interference: An Initial Study on the Impact of Task-Switch in
  Conversational History
LLM Task Interference: An Initial Study on the Impact of Task-Switch in Conversational History
Akash Gupta
Ivaxi Sheth
Vyas Raina
Mark J. F. Gales
Mario Fritz
30
4
0
28 Feb 2024
Linear Transformers are Versatile In-Context Learners
Linear Transformers are Versatile In-Context Learners
Max Vladymyrov
J. Oswald
Mark Sandler
Rong Ge
24
13
0
21 Feb 2024
Implicit Bias and Fast Convergence Rates for Self-attention
Implicit Bias and Fast Convergence Rates for Self-attention
Bhavya Vasudeva
Puneesh Deora
Christos Thrampoulidis
24
13
0
08 Feb 2024
An Information-Theoretic Analysis of In-Context Learning
An Information-Theoretic Analysis of In-Context Learning
Hong Jun Jeon
Jason D. Lee
Qi Lei
Benjamin Van Roy
13
18
0
28 Jan 2024
Setting the Record Straight on Transformer Oversmoothing
Setting the Record Straight on Transformer Oversmoothing
G. Dovonon
M. Bronstein
Matt J. Kusner
20
5
0
09 Jan 2024
The mechanistic basis of data dependence and abrupt learning in an
  in-context classification task
The mechanistic basis of data dependence and abrupt learning in an in-context classification task
Gautam Reddy
17
48
0
03 Dec 2023
Compositional Capabilities of Autoregressive Transformers: A Study on
  Synthetic, Interpretable Tasks
Compositional Capabilities of Autoregressive Transformers: A Study on Synthetic, Interpretable Tasks
Rahul Ramesh
Ekdeep Singh Lubana
Mikail Khona
Robert P. Dick
Hidenori Tanaka
CoGe
22
6
0
21 Nov 2023
Transformers are Provably Optimal In-context Estimators for Wireless Communications
Transformers are Provably Optimal In-context Estimators for Wireless Communications
Vishnu Teja Kunde
Vicram Rajagopalan
Chandra Shekhara Kaushik Valmeekam
Krishna R. Narayanan
S. Shakkottai
D. Kalathil
J. Chamberland
29
4
0
01 Nov 2023
12
Next