Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
2212.07677
Cited By
v1
v2 (latest)
Transformers learn in-context by gradient descent
International Conference on Machine Learning (ICML), 2022
15 December 2022
J. Oswald
Eyvind Niklasson
E. Randazzo
João Sacramento
A. Mordvintsev
A. Zhmoginov
Max Vladymyrov
MLT
Re-assign community
ArXiv (abs)
PDF
HTML
HuggingFace (1 upvotes)
Github (361★)
Papers citing
"Transformers learn in-context by gradient descent"
50 / 457 papers shown
Evolving AI Collectives to Enhance Human Diversity and Enable Self-Regulation
Shiyang Lai
Wenbo Guo
Junsol Kim
Richard Zhuang
Dawn Song
James Evans
267
8
0
19 Feb 2024
Visual In-Context Learning for Large Vision-Language Models
Yucheng Zhou
Xiang Li
Qianning Wang
Jianbing Shen
MLLM
205
114
0
18 Feb 2024
The Evolution of Statistical Induction Heads: In-Context Learning Markov Chains
Benjamin L. Edelman
Ezra Edelman
Surbhi Goel
Eran Malach
Nikolaos Tsilivis
BDL
256
96
0
16 Feb 2024
Pelican Soup Framework: A Theoretical Framework for Language Model Capabilities
Ting-Rui Chiang
Dani Yogatama
169
4
0
16 Feb 2024
The dynamic interplay between in-context and in-weight learning in humans and neural networks
Jacob Russin
Ellie Pavlick
Michael J. Frank
308
4
0
13 Feb 2024
How do Transformers perform In-Context Autoregressive Learning?
Michael E. Sander
Raja Giryes
Taiji Suzuki
Mathieu Blondel
Gabriel Peyré
271
18
0
08 Feb 2024
Implicit Bias and Fast Convergence Rates for Self-attention
Bhavya Vasudeva
Puneesh Deora
Christos Thrampoulidis
389
26
0
08 Feb 2024
Towards Understanding Inductive Bias in Transformers: A View From Infinity
Itay Lavie
Guy Gur-Ari
Zohar Ringel
283
10
0
07 Feb 2024
Can Mamba Learn How to Learn? A Comparative Study on In-Context Learning Tasks
Jongho Park
Jaeseung Park
Zheyang Xiong
Nayoung Lee
Jaewoong Cho
Samet Oymak
Kangwook Lee
Dimitris Papailiopoulos
391
103
0
06 Feb 2024
Understanding the Effect of Noise in LLM Training Data with Algorithmic Chains of Thought
Alex Havrilla
Maia Iyer
289
19
0
06 Feb 2024
In-context learning agents are asymmetric belief updaters
Johannes A. Schubert
Akshay K. Jagadish
Marcel Binz
Eric Schulz
LLMAG
182
15
0
06 Feb 2024
A phase transition between positional and semantic learning in a solvable model of dot-product attention
Neural Information Processing Systems (NeurIPS), 2024
Hugo Cui
Freya Behrens
Florent Krzakala
Lenka Zdeborová
MLT
247
25
0
06 Feb 2024
Attention with Markov: A Framework for Principled Analysis of Transformers via Markov Chains
Ashok Vardhan Makkuva
Marco Bondaschi
Adway Girish
Alliot Nagle
Martin Jaggi
Hyeji Kim
Michael C. Gastpar
OffRL
392
38
0
06 Feb 2024
Attention Meets Post-hoc Interpretability: A Mathematical Perspective
International Conference on Machine Learning (ICML), 2024
Gianluigi Lopardo
F. Precioso
Damien Garreau
249
11
0
05 Feb 2024
C-RAG: Certified Generation Risks for Retrieval-Augmented Language Models
International Conference on Machine Learning (ICML), 2024
Mintong Kang
Nezihe Merve Gürel
Ning Yu
Basel Alomair
Yue Liu
460
30
0
05 Feb 2024
Is Mamba Capable of In-Context Learning?
Riccardo Grazzi
Julien N. Siems
Simon Schrodi
Thomas Brox
Frank Hutter
239
56
0
05 Feb 2024
Data Poisoning for In-context Learning
Pengfei He
Han Xu
Yue Xing
Hui Liu
Makoto Yamada
Shucheng Zhou
SILM
AAML
393
24
0
03 Feb 2024
Can MLLMs Perform Text-to-Image In-Context Learning?
Yuchen Zeng
Wonjun Kang
Yicong Chen
Hyung Il Koo
Kangwook Lee
MLLM
263
14
0
02 Feb 2024
Transformers Learn Nonlinear Features In Context: Nonconvex Mean-field Dynamics on the Attention Landscape
Juno Kim
Taiji Suzuki
367
36
0
02 Feb 2024
LLMs learn governing principles of dynamical systems, revealing an in-context neural scaling law
Toni J. B. Liu
Nicolas Boullé
Raphaël Sarfati
Christopher Earls
AI4TS
239
30
0
01 Feb 2024
Theoretical Understanding of In-Context Learning in Shallow Transformers with Unstructured Data
Yue Xing
Xiaofeng Lin
Chenheng Xu
Namjoon Suh
Qifan Song
Guang Cheng
236
4
0
01 Feb 2024
The Information of Large Language Model Geometry
Zhiquan Tan
Chenghai Li
Weiran Huang
224
6
0
01 Feb 2024
Superiority of Multi-Head Attention in In-Context Linear Regression
Yingqian Cui
Jie Ren
Pengfei He
Shucheng Zhou
Yue Xing
205
22
0
30 Jan 2024
An Information-Theoretic Analysis of In-Context Learning
International Conference on Machine Learning (ICML), 2024
Hong Jun Jeon
Jason D. Lee
Qi Lei
Benjamin Van Roy
357
35
0
28 Jan 2024
In-Context Language Learning: Architectures and Algorithms
International Conference on Machine Learning (ICML), 2024
Ekin Akyürek
Bailin Wang
Yoon Kim
Jacob Andreas
LRM
ReLM
388
80
0
23 Jan 2024
Enhancing In-context Learning via Linear Probe Calibration
International Conference on Artificial Intelligence and Statistics (AISTATS), 2024
Momin Abbas
Yi Zhou
Parikshit Ram
Nathalie Baracaldo
Horst Samulowitz
Theodoros Salonidis
Tianyi Chen
242
17
0
22 Jan 2024
In-context Learning with Retrieved Demonstrations for Language Models: A Survey
an Luo
Xin Xu
Yue Liu
Panupong Pasupat
Mehran Kazemi
RALM
707
78
0
21 Jan 2024
Anchor function: a type of benchmark functions for studying language models
Zhongwang Zhang
Zhiwei Wang
Junjie Yao
Zhangchen Zhou
Xiaolong Li
E. Weinan
Z. Xu
340
9
0
16 Jan 2024
AI-as-exploration: Navigating intelligence space
Dimitri Coelho Mollo
240
2
0
15 Jan 2024
A Survey on Statistical Theory of Deep Learning: Approximation, Training Dynamics, and Generative Models
Annual Review of Statistics and Its Application (ARSIA), 2024
Namjoon Suh
Guang Cheng
MedIm
350
18
0
14 Jan 2024
Batch-ICL: Effective, Efficient, and Order-Agnostic In-Context Learning
Annual Meeting of the Association for Computational Linguistics (ACL), 2024
Kaiyi Zhang
Ang Lv
Yuhan Chen
Hansen Ha
Tao Xu
Rui Yan
316
25
0
12 Jan 2024
Setting the Record Straight on Transformer Oversmoothing
G. Dovonon
M. Bronstein
Matt J. Kusner
406
12
0
09 Jan 2024
Robust Stochastically-Descending Unrolled Networks
Samar Hadou
Navid Naderializadeh
Alejandro Ribeiro
324
8
0
25 Dec 2023
Emergence of In-Context Reinforcement Learning from Noise Distillation
Ilya Zisman
Vladislav Kurenkov
Alexander Nikulin
Viacheslav Sinii
Sergey Kolesnikov
OffRL
374
24
0
19 Dec 2023
Transformers Implement Functional Gradient Descent to Learn Non-Linear Functions In Context
Xiang Cheng
Yuxin Chen
S. Sra
618
61
0
11 Dec 2023
Generalization to New Sequential Decision Making Tasks with In-Context Learning
Sharath Chandra Raparthy
Eric Hambro
Robert Kirk
Mikael Henaff
Roberta Raileanu
OffRL
331
36
0
06 Dec 2023
SARA-RT: Scaling up Robotics Transformers with Self-Adaptive Robust Attention
IEEE International Conference on Robotics and Automation (ICRA), 2023
Isabel Leal
Krzysztof Choromanski
Deepali Jain
Kumar Avinava Dubey
Jake Varley
...
Q. Vuong
Tamás Sarlós
Kenneth Oslund
Karol Hausman
Kanishka Rao
219
20
0
04 Dec 2023
The mechanistic basis of data dependence and abrupt learning in an in-context classification task
International Conference on Learning Representations (ICLR), 2023
Gautam Reddy
311
92
0
03 Dec 2023
Compositional Capabilities of Autoregressive Transformers: A Study on Synthetic, Interpretable Tasks
International Conference on Machine Learning (ICML), 2023
Rahul Ramesh
Ekdeep Singh Lubana
Mikail Khona
Robert P. Dick
Hidenori Tanaka
CoGe
326
14
0
21 Nov 2023
Looped Transformers are Better at Learning Learning Algorithms
International Conference on Learning Representations (ICLR), 2023
Liu Yang
Kangwook Lee
Robert D. Nowak
Dimitris Papailiopoulos
441
55
0
21 Nov 2023
Rethinking Large Language Models in Mental Health Applications
Shaoxiong Ji
Tianlin Zhang
Kailai Yang
Sophia Ananiadou
Xiaoshi Zhong
LM&MA
AI4MH
365
26
0
19 Nov 2023
Exploring the Relationship between In-Context Learning and Instruction Tuning
Hanyu Duan
Yixuan Tang
Yi Yang
Ahmed Abbasi
Kar Yan Tam
220
14
0
17 Nov 2023
Transformers can optimally learn regression mixture models
International Conference on Learning Representations (ICLR), 2023
Reese Pathak
Rajat Sen
Weihao Kong
Abhimanyu Das
196
15
0
14 Nov 2023
The Transient Nature of Emergent In-Context Learning in Transformers
Neural Information Processing Systems (NeurIPS), 2023
Aaditya K. Singh
Stephanie C. Y. Chan
Ted Moskovitz
Erin Grant
Andrew M. Saxe
Felix Hill
470
63
0
14 Nov 2023
In-context Learning and Gradient Descent Revisited
North American Chapter of the Association for Computational Linguistics (NAACL), 2023
Gilad Deutch
Nadav Magar
Tomer Bar Natan
Guy Dar
412
26
0
13 Nov 2023
In-context Vectors: Making In Context Learning More Effective and Controllable Through Latent Space Steering
International Conference on Machine Learning (ICML), 2023
Sheng Liu
Haotian Ye
Lei Xing
James Y. Zou
250
206
0
11 Nov 2023
In-Context Exemplars as Clues to Retrieving from Large Associative Memory
Jiachen Zhao
290
15
0
06 Nov 2023
On the Convergence of Encoder-only Shallow Transformers
Neural Information Processing Systems (NeurIPS), 2023
Yongtao Wu
Fanghui Liu
Grigorios G. Chrysos
Volkan Cevher
219
13
0
02 Nov 2023
Transformers are Provably Optimal In-context Estimators for Wireless Communications
International Conference on Artificial Intelligence and Statistics (AISTATS), 2023
Vishnu Teja Kunde
Vicram Rajagopalan
Chandra Shekhara Kaushik Valmeekam
Krishna R. Narayanan
S. Shakkottai
D. Kalathil
J. Chamberland
593
12
0
01 Nov 2023
The Expressibility of Polynomial based Attention Scheme
Zhao Song
Guangyi Xu
Junze Yin
313
7
0
30 Oct 2023
Previous
1
2
3
...
10
6
7
8
9
Next