Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
2212.07677
Cited By
v1
v2 (latest)
Transformers learn in-context by gradient descent
International Conference on Machine Learning (ICML), 2022
15 December 2022
J. Oswald
Eyvind Niklasson
E. Randazzo
João Sacramento
A. Mordvintsev
A. Zhmoginov
Max Vladymyrov
MLT
Re-assign community
ArXiv (abs)
PDF
HTML
HuggingFace (1 upvotes)
Github (361★)
Papers citing
"Transformers learn in-context by gradient descent"
50 / 453 papers shown
Title
Dissecting In-Context Learning of Translations in GPTs
Vikas Raunak
Hany Awadalla
Arul Menezes
145
3
0
24 Oct 2023
Function Vectors in Large Language Models
International Conference on Learning Representations (ICLR), 2023
Eric Todd
Millicent Li
Arnab Sen Sharma
Aaron Mueller
Byron C. Wallace
David Bau
219
173
0
23 Oct 2023
Learning to (Learn at Test Time)
Yu Sun
Xinhao Li
Karan Dalal
Chloe Hsu
Oluwasanmi Koyejo
Carlos Guestrin
Xiaolong Wang
Tatsunori Hashimoto
Xinlei Chen
SSL
245
10
0
20 Oct 2023
On the Optimization and Generalization of Multi-head Attention
Puneesh Deora
Rouzbeh Ghaderi
Hossein Taheri
Christos Thrampoulidis
MLT
228
41
0
19 Oct 2023
Large Language Model for Multi-objective Evolutionary Optimization
International Conference on Evolutionary Multi-Criterion Optimization (EMO), 2023
Fei Liu
Xi Lin
Zhenkun Wang
Shunyu Yao
Xialiang Tong
Mingxuan Yuan
Qingfu Zhang
249
52
0
19 Oct 2023
How Do Transformers Learn In-Context Beyond Simple Functions? A Case Study on Learning with Representations
International Conference on Learning Representations (ICLR), 2023
Tianyu Guo
Wei Hu
Song Mei
Huan Wang
Caiming Xiong
Silvio Savarese
Yu Bai
179
74
0
16 Oct 2023
Generative Calibration for In-context Learning
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2023
Zhongtao Jiang
Yuanzhe Zhang
Cao Liu
Jun Zhao
Kang Liu
342
21
0
16 Oct 2023
Transformers as Decision Makers: Provable In-Context Reinforcement Learning via Supervised Pretraining
International Conference on Learning Representations (ICLR), 2023
Licong Lin
Yu Bai
Song Mei
OffRL
265
66
0
12 Oct 2023
Do pretrained Transformers Learn In-Context by Gradient Descent?
Lingfeng Shen
Aayush Mishra
Daniel Khashabi
281
10
0
12 Oct 2023
How Many Pretraining Tasks Are Needed for In-Context Learning of Linear Regression?
International Conference on Learning Representations (ICLR), 2023
Jingfeng Wu
Difan Zou
Zixiang Chen
Vladimir Braverman
Quanquan Gu
Peter L. Bartlett
329
83
0
12 Oct 2023
Is attention required for ICL? Exploring the Relationship Between Model Architecture and In-Context Learning Ability
International Conference on Learning Representations (ICLR), 2023
Ivan Lee
Nan Jiang
Taylor Berg-Kirkpatrick
365
15
0
12 Oct 2023
In-Context Unlearning: Language Models as Few Shot Unlearners
International Conference on Machine Learning (ICML), 2023
Martin Pawelczyk
Seth Neel
Himabindu Lakkaraju
MU
411
178
0
11 Oct 2023
Jailbreak and Guard Aligned Language Models with Only Few In-Context Demonstrations
Zeming Wei
Yifei Wang
Ang Li
Yichuan Mo
Yisen Wang
260
383
0
10 Oct 2023
A Meta-Learning Perspective on Transformers for Causal Language Modeling
Annual Meeting of the Association for Computational Linguistics (ACL), 2023
Xinbo Wu
Lav Varshney
233
8
0
09 Oct 2023
In-Context Convergence of Transformers
International Conference on Machine Learning (ICML), 2023
Yu Huang
Yuan Cheng
Yingbin Liang
MLT
245
94
0
08 Oct 2023
The Cost of Down-Scaling Language Models: Fact Recall Deteriorates before In-Context Learning
Tian Jin
Nolan Clement
Xin Dong
Vaishnavh Nagarajan
Michael Carbin
Jonathan Ragan-Kelley
Gintare Karolina Dziugaite
LRM
255
5
0
07 Oct 2023
Fine-tune Language Models to Approximate Unbiased In-context Learning
Timothy Chu
Zhao Song
Chiwun Yang
220
17
0
05 Oct 2023
Understanding In-Context Learning in Transformers and LLMs by Learning to Learn Discrete Functions
International Conference on Learning Representations (ICLR), 2023
S. Bhattamishra
Arkil Patel
Phil Blunsom
Varun Kanade
285
70
0
04 Oct 2023
Linear attention is (maybe) all you need (to understand transformer optimization)
International Conference on Learning Representations (ICLR), 2023
Kwangjun Ahn
Xiang Cheng
Minhak Song
Chulhee Yun
Ali Jadbabaie
S. Sra
315
69
1
02 Oct 2023
JoMA: Demystifying Multilayer Transformers via JOint Dynamics of MLP and Attention
International Conference on Learning Representations (ICLR), 2023
Yuandong Tian
Yiping Wang
Zhenyu Zhang
Beidi Chen
Simon Shaolei Du
266
45
0
01 Oct 2023
Decoding In-Context Learning: Neuroscience-inspired Analysis of Representations in Large Language Models
Safoora Yousefi
Leo Betthauser
Hosein Hasanbeig
Raphael Milliere
Ida Momennejad
400
7
0
30 Sep 2023
Understanding In-Context Learning from Repetitions
International Conference on Learning Representations (ICLR), 2023
Jianhao Yan
Jin Xu
Chiyu Song
Chenming Wu
Yafu Li
Yue Zhang
312
27
0
30 Sep 2023
Reason for Future, Act for Now: A Principled Framework for Autonomous LLM Agents with Provable Sample Efficiency
Zhihan Liu
Hao Hu
Shenao Zhang
Hongyi Guo
Shuqi Ke
Boyi Liu
Zhaoran Wang
LLMAG
LRM
384
44
0
29 Sep 2023
A Benchmark for Learning to Translate a New Language from One Grammar Book
International Conference on Learning Representations (ICLR), 2023
Garrett Tanzer
Mirac Suzgun
Chenguang Xi
Dan Jurafsky
Luke Melas-Kyriazi
230
76
0
28 Sep 2023
Understanding Catastrophic Forgetting in Language Models via Implicit Inference
International Conference on Learning Representations (ICLR), 2023
Suhas Kotha
Jacob Mitchell Springer
Aditi Raghunathan
CLL
367
100
0
18 Sep 2023
Context is Environment
International Conference on Learning Representations (ICLR), 2023
Sharut Gupta
Stefanie Jegelka
David Lopez-Paz
Kartik Ahuja
171
0
0
18 Sep 2023
Breaking through the learning plateaus of in-context learning in Transformer
International Conference on Machine Learning (ICML), 2023
Jingwen Fu
Tao Yang
Yuwang Wang
Yan Lu
Nanning Zheng
256
4
0
12 Sep 2023
Uncovering mesa-optimization algorithms in Transformers
J. Oswald
Eyvind Niklasson
Maximilian Schlegel
Seijin Kobayashi
Nicolas Zucchet
...
Mark Sandler
Blaise Agüera y Arcas
Max Vladymyrov
Razvan Pascanu
João Sacramento
174
79
0
11 Sep 2023
An Appraisal-Based Chain-Of-Emotion Architecture for Affective Language Model Game Agents
PLoS ONE (PLoS ONE), 2023
Maximilian Croissant
Madeleine Frister
Guy Schofield
Cade McCall
LLMAG
170
19
0
10 Sep 2023
Are Emergent Abilities in Large Language Models just In-Context Learning?
Annual Meeting of the Association for Computational Linguistics (ACL), 2023
Sheng Lu
Irina Bigoulaeva
Rachneet Sachdeva
Harish Tayyar Madabushi
Iryna Gurevych
LRM
ELM
ReLM
343
128
0
04 Sep 2023
Gated recurrent neural networks discover attention
Nicolas Zucchet
Seijin Kobayashi
Yassir Akram
J. Oswald
Maxime Larcher
Angelika Steger
João Sacramento
187
9
0
04 Sep 2023
Adversarial Fine-Tuning of Language Models: An Iterative Optimisation Approach for the Generation and Detection of Problematic Content
Charles OÑeill
Jack Miller
I. Ciucă
Y. Ting 丁
Thang Bui
123
9
0
26 Aug 2023
Causal Intersectionality and Dual Form of Gradient Descent for Multimodal Analysis: a Case Study on Hateful Memes
International Conference on Language Resources and Evaluation (LREC), 2023
Yosuke Miyanishi
Minh Le Nguyen
297
2
0
19 Aug 2023
Inductive-bias Learning: Generating Code Models with Large Language Model
Toma Tanaka
Naofumi Emoto
Tsukasa Yumibayashi
AI4CE
134
0
0
19 Aug 2023
CausalLM is not optimal for in-context learning
International Conference on Learning Representations (ICLR), 2023
Nan Ding
Tomer Levinboim
Jialin Wu
Sebastian Goodman
Radu Soricut
148
30
0
14 Aug 2023
In-Context Learning Learns Label Relationships but Is Not Conventional Learning
International Conference on Learning Representations (ICLR), 2023
Jannik Kossen
Y. Gal
Tom Rainforth
490
51
0
23 Jul 2023
What can a Single Attention Layer Learn? A Study Through the Random Features Lens
Neural Information Processing Systems (NeurIPS), 2023
Hengyu Fu
Tianyu Guo
Yu Bai
Song Mei
MLT
169
34
0
21 Jul 2023
SINC: Self-Supervised In-Context Learning for Vision-Language Tasks
IEEE International Conference on Computer Vision (ICCV), 2023
Yi-Syuan Chen
Yun-Zhu Song
Cheng Yu Yeo
Bei Liu
Jianlong Fu
Hong-Han Shuai
VLM
LRM
195
7
0
15 Jul 2023
Large Language Models as General Pattern Machines
Conference on Robot Learning (CoRL), 2023
Suvir Mirchandani
F. Xia
Peter R. Florence
Brian Ichter
Danny Driess
Montse Gonzalez Arenas
Kanishka Rao
Dorsa Sadigh
Andy Zeng
LLMAG
256
251
0
10 Jul 2023
Graph Neural Networks as an Enabler of Terahertz-based Flow-guided Nanoscale Localization over Highly Erroneous Raw Data
IEEE Journal on Selected Areas in Communications (JSAC), 2023
Gerard Calvo Bartra
Filip Lemic
Guillem Pascual
S. Abadal
Jakob Struye
Carmen Delgado
Xavier Costa Pérez
165
3
0
09 Jul 2023
Bidirectional Attention as a Mixture of Continuous Word Experts
Conference on Uncertainty in Artificial Intelligence (UAI), 2023
Kevin Christian Wibisono
Yixin Wang
MoE
91
0
0
08 Jul 2023
One Step of Gradient Descent is Provably the Optimal In-Context Learner with One Layer of Linear Self-Attention
International Conference on Learning Representations (ICLR), 2023
Arvind V. Mahankali
Tatsunori B. Hashimoto
Tengyu Ma
MLT
140
139
0
07 Jul 2023
Scaling In-Context Demonstrations with Structured Attention
Tianle Cai
Kaixuan Huang
Jason D. Lee
Mengdi Wang
LRM
142
9
0
05 Jul 2023
Trainable Transformer in Transformer
International Conference on Machine Learning (ICML), 2023
A. Panigrahi
Sadhika Malladi
Mengzhou Xia
Sanjeev Arora
VLM
292
13
0
03 Jul 2023
Understanding In-Context Learning via Supportive Pretraining Data
Annual Meeting of the Association for Computational Linguistics (ACL), 2023
Xiaochuang Han
Daniel Simig
Todor Mihaylov
Yulia Tsvetkov
Asli Celikyilmaz
Tianlu Wang
AIMat
206
46
0
26 Jun 2023
Pretraining task diversity and the emergence of non-Bayesian in-context learning for regression
Neural Information Processing Systems (NeurIPS), 2023
Allan Raventós
Mansheej Paul
F. Chen
Surya Ganguli
243
122
0
26 Jun 2023
Supervised Pretraining Can Learn In-Context Reinforcement Learning
Neural Information Processing Systems (NeurIPS), 2023
Jonathan Lee
Annie Xie
Aldo Pacchiano
Yash Chandak
Chelsea Finn
Ofir Nachum
Emma Brunskill
OffRL
292
116
0
26 Jun 2023
SynerGPT: In-Context Learning for Personalized Drug Synergy Prediction and Drug Design
bioRxiv (bioRxiv), 2023
Carl Edwards
Aakanksha Naik
Tushar Khot
Martin D. Burke
Heng Ji
Kyle Lo
299
21
0
19 Jun 2023
Trained Transformers Learn Linear Models In-Context
Journal of machine learning research (JMLR), 2023
Ruiqi Zhang
Spencer Frei
Peter L. Bartlett
333
270
0
16 Jun 2023
TART: A plug-and-play Transformer module for task-agnostic reasoning
Neural Information Processing Systems (NeurIPS), 2023
Kush S. Bhatia
A. Narayan
Chris De Sa
Christopher Ré
LRM
ReLM
VLM
127
16
0
13 Jun 2023
Previous
1
2
3
...
10
7
8
9
Next