ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2212.07677
  4. Cited By
Transformers learn in-context by gradient descent
v1v2 (latest)

Transformers learn in-context by gradient descent

International Conference on Machine Learning (ICML), 2022
15 December 2022
J. Oswald
Eyvind Niklasson
E. Randazzo
João Sacramento
A. Mordvintsev
A. Zhmoginov
Max Vladymyrov
    MLT
ArXiv (abs)PDFHTMLHuggingFace (1 upvotes)Github (361★)

Papers citing "Transformers learn in-context by gradient descent"

50 / 453 papers shown
Title
Dissecting In-Context Learning of Translations in GPTs
Dissecting In-Context Learning of Translations in GPTs
Vikas Raunak
Hany Awadalla
Arul Menezes
145
3
0
24 Oct 2023
Function Vectors in Large Language Models
Function Vectors in Large Language ModelsInternational Conference on Learning Representations (ICLR), 2023
Eric Todd
Millicent Li
Arnab Sen Sharma
Aaron Mueller
Byron C. Wallace
David Bau
219
173
0
23 Oct 2023
Learning to (Learn at Test Time)
Learning to (Learn at Test Time)
Yu Sun
Xinhao Li
Karan Dalal
Chloe Hsu
Oluwasanmi Koyejo
Carlos Guestrin
Xiaolong Wang
Tatsunori Hashimoto
Xinlei Chen
SSL
245
10
0
20 Oct 2023
On the Optimization and Generalization of Multi-head Attention
On the Optimization and Generalization of Multi-head Attention
Puneesh Deora
Rouzbeh Ghaderi
Hossein Taheri
Christos Thrampoulidis
MLT
228
41
0
19 Oct 2023
Large Language Model for Multi-objective Evolutionary Optimization
Large Language Model for Multi-objective Evolutionary OptimizationInternational Conference on Evolutionary Multi-Criterion Optimization (EMO), 2023
Fei Liu
Xi Lin
Zhenkun Wang
Shunyu Yao
Xialiang Tong
Mingxuan Yuan
Qingfu Zhang
249
52
0
19 Oct 2023
How Do Transformers Learn In-Context Beyond Simple Functions? A Case
  Study on Learning with Representations
How Do Transformers Learn In-Context Beyond Simple Functions? A Case Study on Learning with RepresentationsInternational Conference on Learning Representations (ICLR), 2023
Tianyu Guo
Wei Hu
Song Mei
Huan Wang
Caiming Xiong
Silvio Savarese
Yu Bai
179
74
0
16 Oct 2023
Generative Calibration for In-context Learning
Generative Calibration for In-context LearningConference on Empirical Methods in Natural Language Processing (EMNLP), 2023
Zhongtao Jiang
Yuanzhe Zhang
Cao Liu
Jun Zhao
Kang Liu
342
21
0
16 Oct 2023
Transformers as Decision Makers: Provable In-Context Reinforcement
  Learning via Supervised Pretraining
Transformers as Decision Makers: Provable In-Context Reinforcement Learning via Supervised PretrainingInternational Conference on Learning Representations (ICLR), 2023
Licong Lin
Yu Bai
Song Mei
OffRL
265
66
0
12 Oct 2023
Do pretrained Transformers Learn In-Context by Gradient Descent?
Do pretrained Transformers Learn In-Context by Gradient Descent?
Lingfeng Shen
Aayush Mishra
Daniel Khashabi
281
10
0
12 Oct 2023
How Many Pretraining Tasks Are Needed for In-Context Learning of Linear
  Regression?
How Many Pretraining Tasks Are Needed for In-Context Learning of Linear Regression?International Conference on Learning Representations (ICLR), 2023
Jingfeng Wu
Difan Zou
Zixiang Chen
Vladimir Braverman
Quanquan Gu
Peter L. Bartlett
329
83
0
12 Oct 2023
Is attention required for ICL? Exploring the Relationship Between Model
  Architecture and In-Context Learning Ability
Is attention required for ICL? Exploring the Relationship Between Model Architecture and In-Context Learning AbilityInternational Conference on Learning Representations (ICLR), 2023
Ivan Lee
Nan Jiang
Taylor Berg-Kirkpatrick
365
15
0
12 Oct 2023
In-Context Unlearning: Language Models as Few Shot Unlearners
In-Context Unlearning: Language Models as Few Shot UnlearnersInternational Conference on Machine Learning (ICML), 2023
Martin Pawelczyk
Seth Neel
Himabindu Lakkaraju
MU
411
178
0
11 Oct 2023
Jailbreak and Guard Aligned Language Models with Only Few In-Context
  Demonstrations
Jailbreak and Guard Aligned Language Models with Only Few In-Context Demonstrations
Zeming Wei
Yifei Wang
Ang Li
Yichuan Mo
Yisen Wang
260
383
0
10 Oct 2023
A Meta-Learning Perspective on Transformers for Causal Language Modeling
A Meta-Learning Perspective on Transformers for Causal Language ModelingAnnual Meeting of the Association for Computational Linguistics (ACL), 2023
Xinbo Wu
Lav Varshney
233
8
0
09 Oct 2023
In-Context Convergence of Transformers
In-Context Convergence of TransformersInternational Conference on Machine Learning (ICML), 2023
Yu Huang
Yuan Cheng
Yingbin Liang
MLT
245
94
0
08 Oct 2023
The Cost of Down-Scaling Language Models: Fact Recall Deteriorates
  before In-Context Learning
The Cost of Down-Scaling Language Models: Fact Recall Deteriorates before In-Context Learning
Tian Jin
Nolan Clement
Xin Dong
Vaishnavh Nagarajan
Michael Carbin
Jonathan Ragan-Kelley
Gintare Karolina Dziugaite
LRM
255
5
0
07 Oct 2023
Fine-tune Language Models to Approximate Unbiased In-context Learning
Fine-tune Language Models to Approximate Unbiased In-context Learning
Timothy Chu
Zhao Song
Chiwun Yang
220
17
0
05 Oct 2023
Understanding In-Context Learning in Transformers and LLMs by Learning
  to Learn Discrete Functions
Understanding In-Context Learning in Transformers and LLMs by Learning to Learn Discrete FunctionsInternational Conference on Learning Representations (ICLR), 2023
S. Bhattamishra
Arkil Patel
Phil Blunsom
Varun Kanade
285
70
0
04 Oct 2023
Linear attention is (maybe) all you need (to understand transformer
  optimization)
Linear attention is (maybe) all you need (to understand transformer optimization)International Conference on Learning Representations (ICLR), 2023
Kwangjun Ahn
Xiang Cheng
Minhak Song
Chulhee Yun
Ali Jadbabaie
S. Sra
315
69
1
02 Oct 2023
JoMA: Demystifying Multilayer Transformers via JOint Dynamics of MLP and
  Attention
JoMA: Demystifying Multilayer Transformers via JOint Dynamics of MLP and AttentionInternational Conference on Learning Representations (ICLR), 2023
Yuandong Tian
Yiping Wang
Zhenyu Zhang
Beidi Chen
Simon Shaolei Du
266
45
0
01 Oct 2023
Decoding In-Context Learning: Neuroscience-inspired Analysis of
  Representations in Large Language Models
Decoding In-Context Learning: Neuroscience-inspired Analysis of Representations in Large Language Models
Safoora Yousefi
Leo Betthauser
Hosein Hasanbeig
Raphael Milliere
Ida Momennejad
400
7
0
30 Sep 2023
Understanding In-Context Learning from Repetitions
Understanding In-Context Learning from RepetitionsInternational Conference on Learning Representations (ICLR), 2023
Jianhao Yan
Jin Xu
Chiyu Song
Chenming Wu
Yafu Li
Yue Zhang
312
27
0
30 Sep 2023
Reason for Future, Act for Now: A Principled Framework for Autonomous
  LLM Agents with Provable Sample Efficiency
Reason for Future, Act for Now: A Principled Framework for Autonomous LLM Agents with Provable Sample Efficiency
Zhihan Liu
Hao Hu
Shenao Zhang
Hongyi Guo
Shuqi Ke
Boyi Liu
Zhaoran Wang
LLMAGLRM
384
44
0
29 Sep 2023
A Benchmark for Learning to Translate a New Language from One Grammar
  Book
A Benchmark for Learning to Translate a New Language from One Grammar BookInternational Conference on Learning Representations (ICLR), 2023
Garrett Tanzer
Mirac Suzgun
Chenguang Xi
Dan Jurafsky
Luke Melas-Kyriazi
230
76
0
28 Sep 2023
Understanding Catastrophic Forgetting in Language Models via Implicit
  Inference
Understanding Catastrophic Forgetting in Language Models via Implicit InferenceInternational Conference on Learning Representations (ICLR), 2023
Suhas Kotha
Jacob Mitchell Springer
Aditi Raghunathan
CLL
367
100
0
18 Sep 2023
Context is Environment
Context is EnvironmentInternational Conference on Learning Representations (ICLR), 2023
Sharut Gupta
Stefanie Jegelka
David Lopez-Paz
Kartik Ahuja
171
0
0
18 Sep 2023
Breaking through the learning plateaus of in-context learning in
  Transformer
Breaking through the learning plateaus of in-context learning in TransformerInternational Conference on Machine Learning (ICML), 2023
Jingwen Fu
Tao Yang
Yuwang Wang
Yan Lu
Nanning Zheng
256
4
0
12 Sep 2023
Uncovering mesa-optimization algorithms in Transformers
Uncovering mesa-optimization algorithms in Transformers
J. Oswald
Eyvind Niklasson
Maximilian Schlegel
Seijin Kobayashi
Nicolas Zucchet
...
Mark Sandler
Blaise Agüera y Arcas
Max Vladymyrov
Razvan Pascanu
João Sacramento
174
79
0
11 Sep 2023
An Appraisal-Based Chain-Of-Emotion Architecture for Affective Language
  Model Game Agents
An Appraisal-Based Chain-Of-Emotion Architecture for Affective Language Model Game AgentsPLoS ONE (PLoS ONE), 2023
Maximilian Croissant
Madeleine Frister
Guy Schofield
Cade McCall
LLMAG
170
19
0
10 Sep 2023
Are Emergent Abilities in Large Language Models just In-Context
  Learning?
Are Emergent Abilities in Large Language Models just In-Context Learning?Annual Meeting of the Association for Computational Linguistics (ACL), 2023
Sheng Lu
Irina Bigoulaeva
Rachneet Sachdeva
Harish Tayyar Madabushi
Iryna Gurevych
LRMELMReLM
343
128
0
04 Sep 2023
Gated recurrent neural networks discover attention
Gated recurrent neural networks discover attention
Nicolas Zucchet
Seijin Kobayashi
Yassir Akram
J. Oswald
Maxime Larcher
Angelika Steger
João Sacramento
187
9
0
04 Sep 2023
Adversarial Fine-Tuning of Language Models: An Iterative Optimisation
  Approach for the Generation and Detection of Problematic Content
Adversarial Fine-Tuning of Language Models: An Iterative Optimisation Approach for the Generation and Detection of Problematic Content
Charles OÑeill
Jack Miller
I. Ciucă
Y. Ting 丁
Thang Bui
123
9
0
26 Aug 2023
Causal Intersectionality and Dual Form of Gradient Descent for
  Multimodal Analysis: a Case Study on Hateful Memes
Causal Intersectionality and Dual Form of Gradient Descent for Multimodal Analysis: a Case Study on Hateful MemesInternational Conference on Language Resources and Evaluation (LREC), 2023
Yosuke Miyanishi
Minh Le Nguyen
297
2
0
19 Aug 2023
Inductive-bias Learning: Generating Code Models with Large Language
  Model
Inductive-bias Learning: Generating Code Models with Large Language Model
Toma Tanaka
Naofumi Emoto
Tsukasa Yumibayashi
AI4CE
134
0
0
19 Aug 2023
CausalLM is not optimal for in-context learning
CausalLM is not optimal for in-context learningInternational Conference on Learning Representations (ICLR), 2023
Nan Ding
Tomer Levinboim
Jialin Wu
Sebastian Goodman
Radu Soricut
148
30
0
14 Aug 2023
In-Context Learning Learns Label Relationships but Is Not Conventional
  Learning
In-Context Learning Learns Label Relationships but Is Not Conventional LearningInternational Conference on Learning Representations (ICLR), 2023
Jannik Kossen
Y. Gal
Tom Rainforth
490
51
0
23 Jul 2023
What can a Single Attention Layer Learn? A Study Through the Random
  Features Lens
What can a Single Attention Layer Learn? A Study Through the Random Features LensNeural Information Processing Systems (NeurIPS), 2023
Hengyu Fu
Tianyu Guo
Yu Bai
Song Mei
MLT
169
34
0
21 Jul 2023
SINC: Self-Supervised In-Context Learning for Vision-Language Tasks
SINC: Self-Supervised In-Context Learning for Vision-Language TasksIEEE International Conference on Computer Vision (ICCV), 2023
Yi-Syuan Chen
Yun-Zhu Song
Cheng Yu Yeo
Bei Liu
Jianlong Fu
Hong-Han Shuai
VLMLRM
195
7
0
15 Jul 2023
Large Language Models as General Pattern Machines
Large Language Models as General Pattern MachinesConference on Robot Learning (CoRL), 2023
Suvir Mirchandani
F. Xia
Peter R. Florence
Brian Ichter
Danny Driess
Montse Gonzalez Arenas
Kanishka Rao
Dorsa Sadigh
Andy Zeng
LLMAG
256
251
0
10 Jul 2023
Graph Neural Networks as an Enabler of Terahertz-based Flow-guided
  Nanoscale Localization over Highly Erroneous Raw Data
Graph Neural Networks as an Enabler of Terahertz-based Flow-guided Nanoscale Localization over Highly Erroneous Raw DataIEEE Journal on Selected Areas in Communications (JSAC), 2023
Gerard Calvo Bartra
Filip Lemic
Guillem Pascual
S. Abadal
Jakob Struye
Carmen Delgado
Xavier Costa Pérez
165
3
0
09 Jul 2023
Bidirectional Attention as a Mixture of Continuous Word Experts
Bidirectional Attention as a Mixture of Continuous Word ExpertsConference on Uncertainty in Artificial Intelligence (UAI), 2023
Kevin Christian Wibisono
Yixin Wang
MoE
91
0
0
08 Jul 2023
One Step of Gradient Descent is Provably the Optimal In-Context Learner
  with One Layer of Linear Self-Attention
One Step of Gradient Descent is Provably the Optimal In-Context Learner with One Layer of Linear Self-AttentionInternational Conference on Learning Representations (ICLR), 2023
Arvind V. Mahankali
Tatsunori B. Hashimoto
Tengyu Ma
MLT
140
139
0
07 Jul 2023
Scaling In-Context Demonstrations with Structured Attention
Scaling In-Context Demonstrations with Structured Attention
Tianle Cai
Kaixuan Huang
Jason D. Lee
Mengdi Wang
LRM
142
9
0
05 Jul 2023
Trainable Transformer in Transformer
Trainable Transformer in TransformerInternational Conference on Machine Learning (ICML), 2023
A. Panigrahi
Sadhika Malladi
Mengzhou Xia
Sanjeev Arora
VLM
292
13
0
03 Jul 2023
Understanding In-Context Learning via Supportive Pretraining Data
Understanding In-Context Learning via Supportive Pretraining DataAnnual Meeting of the Association for Computational Linguistics (ACL), 2023
Xiaochuang Han
Daniel Simig
Todor Mihaylov
Yulia Tsvetkov
Asli Celikyilmaz
Tianlu Wang
AIMat
206
46
0
26 Jun 2023
Pretraining task diversity and the emergence of non-Bayesian in-context
  learning for regression
Pretraining task diversity and the emergence of non-Bayesian in-context learning for regressionNeural Information Processing Systems (NeurIPS), 2023
Allan Raventós
Mansheej Paul
F. Chen
Surya Ganguli
243
122
0
26 Jun 2023
Supervised Pretraining Can Learn In-Context Reinforcement Learning
Supervised Pretraining Can Learn In-Context Reinforcement LearningNeural Information Processing Systems (NeurIPS), 2023
Jonathan Lee
Annie Xie
Aldo Pacchiano
Yash Chandak
Chelsea Finn
Ofir Nachum
Emma Brunskill
OffRL
292
116
0
26 Jun 2023
SynerGPT: In-Context Learning for Personalized Drug Synergy Prediction
  and Drug Design
SynerGPT: In-Context Learning for Personalized Drug Synergy Prediction and Drug DesignbioRxiv (bioRxiv), 2023
Carl Edwards
Aakanksha Naik
Tushar Khot
Martin D. Burke
Heng Ji
Kyle Lo
299
21
0
19 Jun 2023
Trained Transformers Learn Linear Models In-Context
Trained Transformers Learn Linear Models In-ContextJournal of machine learning research (JMLR), 2023
Ruiqi Zhang
Spencer Frei
Peter L. Bartlett
333
270
0
16 Jun 2023
TART: A plug-and-play Transformer module for task-agnostic reasoning
TART: A plug-and-play Transformer module for task-agnostic reasoningNeural Information Processing Systems (NeurIPS), 2023
Kush S. Bhatia
A. Narayan
Chris De Sa
Christopher Ré
LRMReLMVLM
127
16
0
13 Jun 2023
Previous
123...10789
Next