Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
2212.07677
Cited By
v1
v2 (latest)
Transformers learn in-context by gradient descent
International Conference on Machine Learning (ICML), 2022
15 December 2022
J. Oswald
Eyvind Niklasson
E. Randazzo
João Sacramento
A. Mordvintsev
A. Zhmoginov
Max Vladymyrov
MLT
Re-assign community
ArXiv (abs)
PDF
HTML
HuggingFace (1 upvotes)
Github (361★)
Papers citing
"Transformers learn in-context by gradient descent"
50 / 457 papers shown
IM-Context: In-Context Learning for Imbalanced Regression Tasks
Ismail Nejjar
Faez Ahmed
Olga Fink
241
5
0
28 May 2024
On Mesa-Optimization in Autoregressively Trained Transformers: Emergence and Capability
Chenyu Zheng
Wei Huang
Rongzheng Wang
Guoqiang Wu
Jun Zhu
Chongxuan Li
227
7
0
27 May 2024
Automatic Domain Adaptation by Transformers in In-Context Learning
Ryuichiro Hataya
Kota Matsui
Masaaki Imaizumi
206
5
0
27 May 2024
Mixture of In-Context Prompters for Tabular PFNs
Derek Xu
Olcay Cirit
Reza Asadi
Luke Huan
Wei Wang
277
21
0
25 May 2024
Towards Better Understanding of In-Context Learning Ability from In-Context Uncertainty Quantification
Shang Liu
Zhongze Cai
Guanting Chen
Xiaocheng Li
UQCV
227
2
0
24 May 2024
Understanding the Training and Generalization of Pretrained Transformer for Sequential Decision Making
Hanzhao Wang
Yu Pan
Fupeng Sun
Shang Liu
Kalyan Talluri
Guanting Chen
Xiaocheng Li
OffRL
283
2
0
23 May 2024
Implicit In-context Learning
International Conference on Learning Representations (ICLR), 2024
Zhuowei Li
Zihao Xu
Ligong Han
Yunhe Gao
Song Wen
Di Liu
Hao Wang
Dimitris N. Metaxas
355
8
0
23 May 2024
DETAIL: Task DEmonsTration Attribution for Interpretable In-context Learning
Zijian Zhou
Xiaoqiang Lin
Xinyi Xu
Alok Prakash
Daniela Rus
K. H. Low
231
6
0
22 May 2024
Asymptotic theory of in-context learning by linear attention
Yue M. Lu
Mary I. Letey
Jacob A. Zavatone-Veth
Anindita Maiti
Cengiz Pehlevan
523
42
0
20 May 2024
Towards a Theoretical Understanding of the 'Reversal Curse' via Training Dynamics
Hanlin Zhu
Baihe Huang
Shaolun Zhang
Michael I. Jordan
Jiantao Jiao
Yuandong Tian
Stuart Russell
LRM
AI4CE
294
26
0
07 May 2024
Locally Differentially Private In-Context Learning
Chunyan Zheng
Keke Sun
Wenhao Zhao
Haibo Zhou
Lixin Jiang
Shaoyang Song
Chunlai Zhou
396
3
0
07 May 2024
Finite Sample Analysis and Bounds of Generalization Error of Gradient Descent in In-Context Linear Regression
Karthik Duraisamy
MLT
284
4
0
03 May 2024
Position: Understanding LLMs Requires More Than Statistical Generalization
International Conference on Machine Learning (ICML), 2024
Patrik Reizinger
Szilvia Ujváry
Anna Mészáros
A. Kerekes
Wieland Brendel
Ferenc Huszár
341
22
0
03 May 2024
Creative Problem Solving in Large Language and Vision Models -- What Would it Take?
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2024
Lakshmi Nair
Evana Gizzi
Jivko Sinapov
MLLM
314
4
0
02 May 2024
CultiVerse: Towards Cross-Cultural Understanding for Paintings with Large Language Model
Wei Zhang
Wong Kam-Kwai
Biying Xu
Yiwen Ren
Yuhuai Li
Minfeng Zhu
Yingchaojie Feng
Wei Chen
259
4
0
01 May 2024
In-Context Learning with Long-Context Models: An In-Depth Exploration
Amanda Bertsch
Maor Ivgi
Uri Alon
Jonathan Berant
Matthew R. Gormley
Matthew R. Gormley
Graham Neubig
ReLM
AIMat
602
114
0
30 Apr 2024
Exploring the Robustness of In-Context Learning with Noisy Labels
Chen Cheng
Xinzhi Yu
Haodong Wen
Jinsong Sun
Guanzhang Yue
Yihao Zhang
Zeming Wei
NoLa
336
12
0
28 Apr 2024
What Makes Multimodal In-Context Learning Work?
Folco Bertini Baldassini
Mustafa Shukor
Matthieu Cord
Laure Soulier
Benjamin Piwowarski
436
39
0
24 Apr 2024
Setting up the Data Printer with Improved English to Ukrainian Machine Translation
Yurii Paniv
Dmytro Chaplynskyi
Nikita Trynus
Volodymyr Kyrylov
AI4CE
266
3
0
23 Apr 2024
In-Context Learning State Vector with Inner and Momentum Optimization
Dongfang Li
Zhenyu Liu
Xinshuo Hu
Zetian Sun
Baotian Hu
Min Zhang
254
12
0
17 Apr 2024
Many-Shot In-Context Learning
Rishabh Agarwal
Avi Singh
Lei M. Zhang
Bernd Bohnet
Luis Rosias
...
John D. Co-Reyes
Eric Chu
Feryal M. P. Behbahani
Aleksandra Faust
Hugo Larochelle
ReLM
OffRL
BDL
426
180
0
17 Apr 2024
Unveiling the Misuse Potential of Base Large Language Models via In-Context Learning
Xiao Wang
Tianze Chen
Xianjun Yang
Tao Gui
Xun Zhao
Dahua Lin
ELM
234
11
0
16 Apr 2024
Decomposing Label Space, Format and Discrimination: Rethinking How LLMs Respond and Solve Tasks via In-Context Learning
Quanyu Long
Yin Wu
Wenya Wang
Sinno Jialin Pan
269
8
0
11 Apr 2024
From Words to Numbers: Your Large Language Model Is Secretly A Capable Regressor When Given In-Context Examples
Robert Vacareanu
Vlad-Andrei Negru
Vasile Suciu
Mihai Surdeanu
321
59
0
11 Apr 2024
Elephants Never Forget: Memorization and Learning of Tabular Data in Large Language Models
Sebastian Bordt
Harsha Nori
Vanessa Rodrigues
Besmira Nushi
Rich Caruana
294
26
0
09 Apr 2024
How does Multi-Task Training Affect Transformer In-Context Capabilities? Investigations with Function Classes
North American Chapter of the Association for Computational Linguistics (NAACL), 2024
Harmon Bhasin
Timothy Ossowski
Yiqiao Zhong
Junjie Hu
107
2
0
04 Apr 2024
Deconstructing In-Context Learning: Understanding Prompts via Corruption
International Conference on Language Resources and Evaluation (LREC), 2024
Namrata Shivagunde
Vladislav Lialin
Sherin Muckatira
Anna Rumshisky
334
8
0
02 Apr 2024
What Can Transformer Learn with Varying Depth? Case Studies on Sequence Learning Tasks
International Conference on Machine Learning (ICML), 2024
Xingwu Chen
Difan Zou
ViT
268
20
0
02 Apr 2024
IsoBench: Benchmarking Multimodal Foundation Models on Isomorphic Representations
Deqing Fu
Ghazal Khalighinejad
Ollie Liu
Bhuwan Dhingra
Dani Yogatama
Robin Jia
Willie Neiswanger
452
36
0
01 Apr 2024
Keypoint Action Tokens Enable In-Context Imitation Learning in Robotics
Norman Di Palo
Edward Johns
330
64
0
28 Mar 2024
The Topos of Transformer Networks
Mattia Jacopo Villani
Peter McBurney
303
2
0
27 Mar 2024
Can large language models explore in-context?
Neural Information Processing Systems (NeurIPS), 2024
Akshay Krishnamurthy
Keegan Harris
Dylan J. Foster
Cyril Zhang
Aleksandrs Slivkins
LM&Ro
LLMAG
LRM
586
53
0
22 Mar 2024
Computational Models to Study Language Processing in the Human Brain: A Survey
Shaonan Wang
Jingyuan Sun
Yunhao Zhang
Nan Lin
Marie-Francine Moens
Chengqing Zong
245
7
0
20 Mar 2024
Transfer Learning Beyond Bounded Density Ratios
Alkis Kalavasis
Ilias Zadik
Manolis Zampetakis
235
6
0
18 Mar 2024
Scaling Data Diversity for Fine-Tuning Language Models in Human Alignment
Feifan Song
Bowen Yu
Hao Lang
Haiyang Yu
Fei Huang
Houfeng Wang
Yongbin Li
ALM
169
22
0
17 Mar 2024
Mechanics of Next Token Prediction with Self-Attention
International Conference on Artificial Intelligence and Statistics (AISTATS), 2024
Yingcong Li
Yixiao Huang
M. E. Ildiz
A. S. Rawat
Samet Oymak
216
42
0
12 Mar 2024
Transformers Learn Low Sensitivity Functions: Investigations and Implications
International Conference on Learning Representations (ICLR), 2024
Bhavya Vasudeva
Deqing Fu
Tianyi Zhou
Elliott Kau
Youqi Huang
Willie Neiswanger
460
2
0
11 Mar 2024
How Well Can Transformers Emulate In-context Newton's Method?
Angeliki Giannou
Liu Yang
Tianhao Wang
Dimitris Papailiopoulos
Jason D. Lee
243
27
0
05 Mar 2024
LLM Task Interference: An Initial Study on the Impact of Task-Switch in Conversational History
Akash Gupta
Ivaxi Sheth
Vyas Raina
Mark Gales
Mario Fritz
276
14
0
28 Feb 2024
Case-Based or Rule-Based: How Do Transformers Do the Math?
Yi Hu
Xiaojuan Tang
Haotong Yang
Muhan Zhang
LRM
418
29
0
27 Feb 2024
Investigating the Effectiveness of HyperTuning via Gisting
Jason Phang
295
2
0
26 Feb 2024
RoCoIns: Enhancing Robustness of Large Language Models through Code-Style Instructions
Yuan Zhang
Xiao Wang
Zhiheng Xi
Han Xia
Tao Gui
Tao Gui
Xuanjing Huang
192
6
0
26 Feb 2024
How Do Nonlinear Transformers Learn and Generalize in In-Context Learning?
Hongkang Li
Meng Wang
Songtao Lu
Xiaodong Cui
Pin-Yu Chen
MLT
462
31
0
23 Feb 2024
In-Context Learning of a Linear Transformer Block: Benefits of the MLP Component and One-Step GD Initialization
Ruiqi Zhang
Jingfeng Wu
Peter L. Bartlett
304
28
0
22 Feb 2024
Prompting a Pretrained Transformer Can Be a Universal Approximator
Aleksandar Petrov
Juil Sock
Adel Bibi
220
19
0
22 Feb 2024
Linear Transformers are Versatile In-Context Learners
Max Vladymyrov
J. Oswald
Mark Sandler
Rong Ge
206
28
0
21 Feb 2024
Do Efficient Transformers Really Save Computation?
Kai-Bo Yang
Jan Ackermann
Zhenyu He
Guhao Feng
Bohang Zhang
Yunzhen Feng
Qiwei Ye
Di He
Liwei Wang
258
28
0
21 Feb 2024
AlgoFormer: An Efficient Transformer Framework with Algorithmic Structures
Yihang Gao
Chuanyang Zheng
Enze Xie
Han Shi
Tianyang Hu
Yu Li
Michael K. Ng
Zhenguo Li
Zhaoqiang Liu
126
2
0
21 Feb 2024
From Self-Attention to Markov Models: Unveiling the Dynamics of Generative Transformers
M. E. Ildiz
Yixiao Huang
Yingcong Li
A. S. Rawat
Samet Oymak
208
33
0
21 Feb 2024
The Impact of Demonstrations on Multilingual In-Context Learning: A Multidimensional Analysis
Miaoran Zhang
Vagrant Gautam
Mingyang Wang
Jesujoba Oluwadara Alabi
Xiaoyu Shen
Dietrich Klakow
Marius Mosbach
241
13
0
20 Feb 2024
Previous
1
2
3
...
10
5
6
7
8
9
Next