ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2212.07677
  4. Cited By
Transformers learn in-context by gradient descent
v1v2 (latest)

Transformers learn in-context by gradient descent

International Conference on Machine Learning (ICML), 2022
15 December 2022
J. Oswald
Eyvind Niklasson
E. Randazzo
João Sacramento
A. Mordvintsev
A. Zhmoginov
Max Vladymyrov
    MLT
ArXiv (abs)PDFHTMLHuggingFace (1 upvotes)Github (361★)

Papers citing "Transformers learn in-context by gradient descent"

50 / 457 papers shown
IM-Context: In-Context Learning for Imbalanced Regression Tasks
IM-Context: In-Context Learning for Imbalanced Regression Tasks
Ismail Nejjar
Faez Ahmed
Olga Fink
241
5
0
28 May 2024
On Mesa-Optimization in Autoregressively Trained Transformers: Emergence
  and Capability
On Mesa-Optimization in Autoregressively Trained Transformers: Emergence and Capability
Chenyu Zheng
Wei Huang
Rongzheng Wang
Guoqiang Wu
Jun Zhu
Chongxuan Li
227
7
0
27 May 2024
Automatic Domain Adaptation by Transformers in In-Context Learning
Automatic Domain Adaptation by Transformers in In-Context Learning
Ryuichiro Hataya
Kota Matsui
Masaaki Imaizumi
206
5
0
27 May 2024
Mixture of In-Context Prompters for Tabular PFNs
Mixture of In-Context Prompters for Tabular PFNs
Derek Xu
Olcay Cirit
Reza Asadi
Luke Huan
Wei Wang
277
21
0
25 May 2024
Towards Better Understanding of In-Context Learning Ability from
  In-Context Uncertainty Quantification
Towards Better Understanding of In-Context Learning Ability from In-Context Uncertainty Quantification
Shang Liu
Zhongze Cai
Guanting Chen
Xiaocheng Li
UQCV
227
2
0
24 May 2024
Understanding the Training and Generalization of Pretrained Transformer
  for Sequential Decision Making
Understanding the Training and Generalization of Pretrained Transformer for Sequential Decision Making
Hanzhao Wang
Yu Pan
Fupeng Sun
Shang Liu
Kalyan Talluri
Guanting Chen
Xiaocheng Li
OffRL
283
2
0
23 May 2024
Implicit In-context Learning
Implicit In-context LearningInternational Conference on Learning Representations (ICLR), 2024
Zhuowei Li
Zihao Xu
Ligong Han
Yunhe Gao
Song Wen
Di Liu
Hao Wang
Dimitris N. Metaxas
355
8
0
23 May 2024
DETAIL: Task DEmonsTration Attribution for Interpretable In-context
  Learning
DETAIL: Task DEmonsTration Attribution for Interpretable In-context Learning
Zijian Zhou
Xiaoqiang Lin
Xinyi Xu
Alok Prakash
Daniela Rus
K. H. Low
231
6
0
22 May 2024
Asymptotic theory of in-context learning by linear attention
Asymptotic theory of in-context learning by linear attention
Yue M. Lu
Mary I. Letey
Jacob A. Zavatone-Veth
Anindita Maiti
Cengiz Pehlevan
523
42
0
20 May 2024
Towards a Theoretical Understanding of the 'Reversal Curse' via Training
  Dynamics
Towards a Theoretical Understanding of the 'Reversal Curse' via Training Dynamics
Hanlin Zhu
Baihe Huang
Shaolun Zhang
Michael I. Jordan
Jiantao Jiao
Yuandong Tian
Stuart Russell
LRMAI4CE
294
26
0
07 May 2024
Locally Differentially Private In-Context Learning
Locally Differentially Private In-Context Learning
Chunyan Zheng
Keke Sun
Wenhao Zhao
Haibo Zhou
Lixin Jiang
Shaoyang Song
Chunlai Zhou
396
3
0
07 May 2024
Finite Sample Analysis and Bounds of Generalization Error of Gradient
  Descent in In-Context Linear Regression
Finite Sample Analysis and Bounds of Generalization Error of Gradient Descent in In-Context Linear Regression
Karthik Duraisamy
MLT
284
4
0
03 May 2024
Position: Understanding LLMs Requires More Than Statistical
  Generalization
Position: Understanding LLMs Requires More Than Statistical GeneralizationInternational Conference on Machine Learning (ICML), 2024
Patrik Reizinger
Szilvia Ujváry
Anna Mészáros
A. Kerekes
Wieland Brendel
Ferenc Huszár
341
22
0
03 May 2024
Creative Problem Solving in Large Language and Vision Models -- What
  Would it Take?
Creative Problem Solving in Large Language and Vision Models -- What Would it Take?Conference on Empirical Methods in Natural Language Processing (EMNLP), 2024
Lakshmi Nair
Evana Gizzi
Jivko Sinapov
MLLM
314
4
0
02 May 2024
CultiVerse: Towards Cross-Cultural Understanding for Paintings with
  Large Language Model
CultiVerse: Towards Cross-Cultural Understanding for Paintings with Large Language Model
Wei Zhang
Wong Kam-Kwai
Biying Xu
Yiwen Ren
Yuhuai Li
Minfeng Zhu
Yingchaojie Feng
Wei Chen
259
4
0
01 May 2024
In-Context Learning with Long-Context Models: An In-Depth Exploration
In-Context Learning with Long-Context Models: An In-Depth Exploration
Amanda Bertsch
Maor Ivgi
Uri Alon
Jonathan Berant
Matthew R. Gormley
Matthew R. Gormley
Graham Neubig
ReLMAIMat
602
114
0
30 Apr 2024
Exploring the Robustness of In-Context Learning with Noisy Labels
Exploring the Robustness of In-Context Learning with Noisy Labels
Chen Cheng
Xinzhi Yu
Haodong Wen
Jinsong Sun
Guanzhang Yue
Yihao Zhang
Zeming Wei
NoLa
336
12
0
28 Apr 2024
What Makes Multimodal In-Context Learning Work?
What Makes Multimodal In-Context Learning Work?
Folco Bertini Baldassini
Mustafa Shukor
Matthieu Cord
Laure Soulier
Benjamin Piwowarski
436
39
0
24 Apr 2024
Setting up the Data Printer with Improved English to Ukrainian Machine
  Translation
Setting up the Data Printer with Improved English to Ukrainian Machine Translation
Yurii Paniv
Dmytro Chaplynskyi
Nikita Trynus
Volodymyr Kyrylov
AI4CE
266
3
0
23 Apr 2024
In-Context Learning State Vector with Inner and Momentum Optimization
In-Context Learning State Vector with Inner and Momentum Optimization
Dongfang Li
Zhenyu Liu
Xinshuo Hu
Zetian Sun
Baotian Hu
Min Zhang
254
12
0
17 Apr 2024
Many-Shot In-Context Learning
Many-Shot In-Context Learning
Rishabh Agarwal
Avi Singh
Lei M. Zhang
Bernd Bohnet
Luis Rosias
...
John D. Co-Reyes
Eric Chu
Feryal M. P. Behbahani
Aleksandra Faust
Hugo Larochelle
ReLMOffRLBDL
426
180
0
17 Apr 2024
Unveiling the Misuse Potential of Base Large Language Models via
  In-Context Learning
Unveiling the Misuse Potential of Base Large Language Models via In-Context Learning
Xiao Wang
Tianze Chen
Xianjun Yang
Tao Gui
Xun Zhao
Dahua Lin
ELM
234
11
0
16 Apr 2024
Decomposing Label Space, Format and Discrimination: Rethinking How LLMs
  Respond and Solve Tasks via In-Context Learning
Decomposing Label Space, Format and Discrimination: Rethinking How LLMs Respond and Solve Tasks via In-Context Learning
Quanyu Long
Yin Wu
Wenya Wang
Sinno Jialin Pan
269
8
0
11 Apr 2024
From Words to Numbers: Your Large Language Model Is Secretly A Capable
  Regressor When Given In-Context Examples
From Words to Numbers: Your Large Language Model Is Secretly A Capable Regressor When Given In-Context Examples
Robert Vacareanu
Vlad-Andrei Negru
Vasile Suciu
Mihai Surdeanu
321
59
0
11 Apr 2024
Elephants Never Forget: Memorization and Learning of Tabular Data in
  Large Language Models
Elephants Never Forget: Memorization and Learning of Tabular Data in Large Language Models
Sebastian Bordt
Harsha Nori
Vanessa Rodrigues
Besmira Nushi
Rich Caruana
294
26
0
09 Apr 2024
How does Multi-Task Training Affect Transformer In-Context Capabilities?
  Investigations with Function Classes
How does Multi-Task Training Affect Transformer In-Context Capabilities? Investigations with Function ClassesNorth American Chapter of the Association for Computational Linguistics (NAACL), 2024
Harmon Bhasin
Timothy Ossowski
Yiqiao Zhong
Junjie Hu
107
2
0
04 Apr 2024
Deconstructing In-Context Learning: Understanding Prompts via Corruption
Deconstructing In-Context Learning: Understanding Prompts via CorruptionInternational Conference on Language Resources and Evaluation (LREC), 2024
Namrata Shivagunde
Vladislav Lialin
Sherin Muckatira
Anna Rumshisky
334
8
0
02 Apr 2024
What Can Transformer Learn with Varying Depth? Case Studies on Sequence
  Learning Tasks
What Can Transformer Learn with Varying Depth? Case Studies on Sequence Learning TasksInternational Conference on Machine Learning (ICML), 2024
Xingwu Chen
Difan Zou
ViT
268
20
0
02 Apr 2024
IsoBench: Benchmarking Multimodal Foundation Models on Isomorphic
  Representations
IsoBench: Benchmarking Multimodal Foundation Models on Isomorphic Representations
Deqing Fu
Ghazal Khalighinejad
Ollie Liu
Bhuwan Dhingra
Dani Yogatama
Robin Jia
Willie Neiswanger
452
36
0
01 Apr 2024
Keypoint Action Tokens Enable In-Context Imitation Learning in Robotics
Keypoint Action Tokens Enable In-Context Imitation Learning in Robotics
Norman Di Palo
Edward Johns
330
64
0
28 Mar 2024
The Topos of Transformer Networks
Mattia Jacopo Villani
Peter McBurney
303
2
0
27 Mar 2024
Can large language models explore in-context?
Can large language models explore in-context?Neural Information Processing Systems (NeurIPS), 2024
Akshay Krishnamurthy
Keegan Harris
Dylan J. Foster
Cyril Zhang
Aleksandrs Slivkins
LM&RoLLMAGLRM
586
53
0
22 Mar 2024
Computational Models to Study Language Processing in the Human Brain: A
  Survey
Computational Models to Study Language Processing in the Human Brain: A Survey
Shaonan Wang
Jingyuan Sun
Yunhao Zhang
Nan Lin
Marie-Francine Moens
Chengqing Zong
245
7
0
20 Mar 2024
Transfer Learning Beyond Bounded Density Ratios
Transfer Learning Beyond Bounded Density Ratios
Alkis Kalavasis
Ilias Zadik
Manolis Zampetakis
235
6
0
18 Mar 2024
Scaling Data Diversity for Fine-Tuning Language Models in Human
  Alignment
Scaling Data Diversity for Fine-Tuning Language Models in Human Alignment
Feifan Song
Bowen Yu
Hao Lang
Haiyang Yu
Fei Huang
Houfeng Wang
Yongbin Li
ALM
169
22
0
17 Mar 2024
Mechanics of Next Token Prediction with Self-Attention
Mechanics of Next Token Prediction with Self-AttentionInternational Conference on Artificial Intelligence and Statistics (AISTATS), 2024
Yingcong Li
Yixiao Huang
M. E. Ildiz
A. S. Rawat
Samet Oymak
216
42
0
12 Mar 2024
Transformers Learn Low Sensitivity Functions: Investigations and Implications
Transformers Learn Low Sensitivity Functions: Investigations and ImplicationsInternational Conference on Learning Representations (ICLR), 2024
Bhavya Vasudeva
Deqing Fu
Tianyi Zhou
Elliott Kau
Youqi Huang
Willie Neiswanger
460
2
0
11 Mar 2024
How Well Can Transformers Emulate In-context Newton's Method?
How Well Can Transformers Emulate In-context Newton's Method?
Angeliki Giannou
Liu Yang
Tianhao Wang
Dimitris Papailiopoulos
Jason D. Lee
243
27
0
05 Mar 2024
LLM Task Interference: An Initial Study on the Impact of Task-Switch in
  Conversational History
LLM Task Interference: An Initial Study on the Impact of Task-Switch in Conversational History
Akash Gupta
Ivaxi Sheth
Vyas Raina
Mark Gales
Mario Fritz
276
14
0
28 Feb 2024
Case-Based or Rule-Based: How Do Transformers Do the Math?
Case-Based or Rule-Based: How Do Transformers Do the Math?
Yi Hu
Xiaojuan Tang
Haotong Yang
Muhan Zhang
LRM
418
29
0
27 Feb 2024
Investigating the Effectiveness of HyperTuning via Gisting
Investigating the Effectiveness of HyperTuning via Gisting
Jason Phang
295
2
0
26 Feb 2024
RoCoIns: Enhancing Robustness of Large Language Models through
  Code-Style Instructions
RoCoIns: Enhancing Robustness of Large Language Models through Code-Style Instructions
Yuan Zhang
Xiao Wang
Zhiheng Xi
Han Xia
Tao Gui
Tao Gui
Xuanjing Huang
192
6
0
26 Feb 2024
How Do Nonlinear Transformers Learn and Generalize in In-Context
  Learning?
How Do Nonlinear Transformers Learn and Generalize in In-Context Learning?
Hongkang Li
Meng Wang
Songtao Lu
Xiaodong Cui
Pin-Yu Chen
MLT
462
31
0
23 Feb 2024
In-Context Learning of a Linear Transformer Block: Benefits of the MLP
  Component and One-Step GD Initialization
In-Context Learning of a Linear Transformer Block: Benefits of the MLP Component and One-Step GD Initialization
Ruiqi Zhang
Jingfeng Wu
Peter L. Bartlett
304
28
0
22 Feb 2024
Prompting a Pretrained Transformer Can Be a Universal Approximator
Prompting a Pretrained Transformer Can Be a Universal Approximator
Aleksandar Petrov
Juil Sock
Adel Bibi
220
19
0
22 Feb 2024
Linear Transformers are Versatile In-Context Learners
Linear Transformers are Versatile In-Context Learners
Max Vladymyrov
J. Oswald
Mark Sandler
Rong Ge
206
28
0
21 Feb 2024
Do Efficient Transformers Really Save Computation?
Do Efficient Transformers Really Save Computation?
Kai-Bo Yang
Jan Ackermann
Zhenyu He
Guhao Feng
Bohang Zhang
Yunzhen Feng
Qiwei Ye
Di He
Liwei Wang
258
28
0
21 Feb 2024
AlgoFormer: An Efficient Transformer Framework with Algorithmic Structures
AlgoFormer: An Efficient Transformer Framework with Algorithmic Structures
Yihang Gao
Chuanyang Zheng
Enze Xie
Han Shi
Tianyang Hu
Yu Li
Michael K. Ng
Zhenguo Li
Zhaoqiang Liu
126
2
0
21 Feb 2024
From Self-Attention to Markov Models: Unveiling the Dynamics of
  Generative Transformers
From Self-Attention to Markov Models: Unveiling the Dynamics of Generative Transformers
M. E. Ildiz
Yixiao Huang
Yingcong Li
A. S. Rawat
Samet Oymak
208
33
0
21 Feb 2024
The Impact of Demonstrations on Multilingual In-Context Learning: A
  Multidimensional Analysis
The Impact of Demonstrations on Multilingual In-Context Learning: A Multidimensional Analysis
Miaoran Zhang
Vagrant Gautam
Mingyang Wang
Jesujoba Oluwadara Alabi
Xiaoyu Shen
Dietrich Klakow
Marius Mosbach
241
13
0
20 Feb 2024
Previous
123...1056789
Next