v1v2 (latest)

Transformers learn in-context by gradient descent

International Conference on Machine Learning (ICML), 2022

15 December 2022

ArXiv (abs)PDF HTML HuggingFace (1 upvotes)Github (361★)

Papers citing "Transformers learn in-context by gradient descent"

50 / 457 papers shown

IM-Context: In-Context Learning for Imbalanced Regression Tasks

Ismail Nejjar

Faez Ahmed

Olga Fink

241

28 May 2024

On Mesa-Optimization in Autoregressively Trained Transformers: Emergence and Capability

Jun Zhu

227

27 May 2024

Automatic Domain Adaptation by Transformers in In-Context Learning

Ryuichiro Hataya

Kota Matsui

Masaaki Imaizumi

206

27 May 2024

Mixture of In-Context Prompters for Tabular PFNs

277

25 May 2024

Towards Better Understanding of In-Context Learning Ability from In-Context Uncertainty Quantification

Xiaocheng Li

227

24 May 2024

Understanding the Training and Generalization of Pretrained Transformer for Sequential Decision Making

Xiaocheng Li

283

23 May 2024

Implicit In-context LearningInternational Conference on Learning Representations (ICLR), 2024

Di Liu

355

23 May 2024

DETAIL: Task DEmonsTration Attribution for Interpretable In-context Learning

Xinyi Xu

231

22 May 2024

Asymptotic theory of in-context learning by linear attention

Yue M. Lu

Mary I. Letey

Jacob A. Zavatone-Veth

Anindita Maiti

Cengiz Pehlevan

523

20 May 2024

Towards a Theoretical Understanding of the 'Reversal Curse' via Training Dynamics

294

07 May 2024

Locally Differentially Private In-Context Learning

396

07 May 2024

Finite Sample Analysis and Bounds of Generalization Error of Gradient Descent in In-Context Linear Regression

Karthik Duraisamy

MLT

284

03 May 2024

Position: Understanding LLMs Requires More Than Statistical GeneralizationInternational Conference on Machine Learning (ICML), 2024

Wieland Brendel

341

03 May 2024

Creative Problem Solving in Large Language and Vision Models -- What Would it Take?Conference on Empirical Methods in Natural Language Processing (EMNLP), 2024

314

02 May 2024

CultiVerse: Towards Cross-Cultural Understanding for Paintings with Large Language Model

259

01 May 2024

In-Context Learning with Long-Context Models: An In-Depth Exploration

Graham Neubig

602

114

30 Apr 2024

Exploring the Robustness of In-Context Learning with Noisy Labels

Zeming Wei

336

28 Apr 2024

What Makes Multimodal In-Context Learning Work?

Folco Bertini Baldassini

436

24 Apr 2024

Setting up the Data Printer with Improved English to Ukrainian Machine Translation

266

23 Apr 2024

In-Context Learning State Vector with Inner and Momentum Optimization

Baotian Hu

Min Zhang

254

17 Apr 2024

Many-Shot In-Context Learning

Lei M. Zhang

...

Feryal M. P. Behbahani

Aleksandra Faust

Hugo Larochelle

ReLM OffRL BDL

426

180

17 Apr 2024

Unveiling the Misuse Potential of Base Large Language Models via In-Context Learning

Dahua Lin

234

16 Apr 2024

Decomposing Label Space, Format and Discrimination: Rethinking How LLMs Respond and Solve Tasks via In-Context Learning

269

11 Apr 2024

From Words to Numbers: Your Large Language Model Is Secretly A Capable Regressor When Given In-Context Examples

321

11 Apr 2024

Elephants Never Forget: Memorization and Learning of Tabular Data in Large Language Models

294

09 Apr 2024

How does Multi-Task Training Affect Transformer In-Context Capabilities? Investigations with Function ClassesNorth American Chapter of the Association for Computational Linguistics (NAACL), 2024

107

04 Apr 2024

Deconstructing In-Context Learning: Understanding Prompts via CorruptionInternational Conference on Language Resources and Evaluation (LREC), 2024

334

02 Apr 2024

What Can Transformer Learn with Varying Depth? Case Studies on Sequence Learning TasksInternational Conference on Machine Learning (ICML), 2024

Xingwu Chen

Difan Zou

ViT

268

02 Apr 2024

IsoBench: Benchmarking Multimodal Foundation Models on Isomorphic Representations

452

01 Apr 2024

Keypoint Action Tokens Enable In-Context Imitation Learning in Robotics

Norman Di Palo

Edward Johns

330

28 Mar 2024

The Topos of Transformer Networks

Mattia Jacopo Villani

Peter McBurney

303

27 Mar 2024

Can large language models explore in-context?Neural Information Processing Systems (NeurIPS), 2024

586

22 Mar 2024

Computational Models to Study Language Processing in the Human Brain: A Survey

245

20 Mar 2024

Transfer Learning Beyond Bounded Density Ratios

Alkis Kalavasis

Ilias Zadik

Manolis Zampetakis

235

18 Mar 2024

Scaling Data Diversity for Fine-Tuning Language Models in Human Alignment

Feifan Song

Bowen Yu

Houfeng Wang

169

17 Mar 2024

Mechanics of Next Token Prediction with Self-AttentionInternational Conference on Artificial Intelligence and Statistics (AISTATS), 2024

216

12 Mar 2024

Transformers Learn Low Sensitivity Functions: Investigations and ImplicationsInternational Conference on Learning Representations (ICLR), 2024

460

11 Mar 2024

How Well Can Transformers Emulate In-context Newton's Method?

Angeliki Giannou

Liu Yang

Tianhao Wang

Dimitris Papailiopoulos

Jason D. Lee

243

05 Mar 2024

LLM Task Interference: An Initial Study on the Impact of Task-Switch in Conversational History

Mario Fritz

276

28 Feb 2024

Case-Based or Rule-Based: How Do Transformers Do the Math?

418

27 Feb 2024

Investigating the Effectiveness of HyperTuning via Gisting

Jason Phang

295

26 Feb 2024

RoCoIns: Enhancing Robustness of Large Language Models through Code-Style Instructions

Xuanjing Huang

192

26 Feb 2024

How Do Nonlinear Transformers Learn and Generalize in In-Context Learning?

462

23 Feb 2024

In-Context Learning of a Linear Transformer Block: Benefits of the MLP Component and One-Step GD Initialization

Ruiqi Zhang

Jingfeng Wu

Peter L. Bartlett

304

22 Feb 2024

Prompting a Pretrained Transformer Can Be a Universal Approximator

Aleksandar Petrov

Juil Sock

Adel Bibi

220

22 Feb 2024

Linear Transformers are Versatile In-Context Learners

206

21 Feb 2024

Do Efficient Transformers Really Save Computation?

258

21 Feb 2024

AlgoFormer: An Efficient Transformer Framework with Algorithmic Structures

Enze Xie

Tianyang Hu

126

21 Feb 2024

From Self-Attention to Markov Models: Unveiling the Dynamics of Generative Transformers

208

21 Feb 2024

The Impact of Demonstrations on Multilingual In-Context Learning: A Multidimensional Analysis

Miaoran Zhang

Vagrant Gautam

Mingyang Wang

Jesujoba Oluwadara Alabi

Xiaoyu Shen

Dietrich Klakow

Marius Mosbach

241

20 Feb 2024