v1v2 (latest)

Transformers learn in-context by gradient descent

International Conference on Machine Learning (ICML), 2022

15 December 2022

ArXiv (abs)PDF HTML HuggingFace (1 upvotes)Github (361★)

Papers citing "Transformers learn in-context by gradient descent"

50 / 457 papers shown

Evolving AI Collectives to Enhance Human Diversity and Enable Self-Regulation

267

19 Feb 2024

Visual In-Context Learning for Large Vision-Language Models

205

114

18 Feb 2024

The Evolution of Statistical Induction Heads: In-Context Learning Markov Chains

256

16 Feb 2024

Pelican Soup Framework: A Theoretical Framework for Language Model Capabilities

Ting-Rui Chiang

Dani Yogatama

169

16 Feb 2024

The dynamic interplay between in-context and in-weight learning in humans and neural networks

Jacob Russin

Ellie Pavlick

Michael J. Frank

308

13 Feb 2024

How do Transformers perform In-Context Autoregressive Learning?

271

08 Feb 2024

Implicit Bias and Fast Convergence Rates for Self-attention

Bhavya Vasudeva

Puneesh Deora

Christos Thrampoulidis

389

08 Feb 2024

Towards Understanding Inductive Bias in Transformers: A View From Infinity

Itay Lavie

Guy Gur-Ari

Zohar Ringel

283

07 Feb 2024

Can Mamba Learn How to Learn? A Comparative Study on In-Context Learning Tasks

Dimitris Papailiopoulos

391

103

06 Feb 2024

Understanding the Effect of Noise in LLM Training Data with Algorithmic Chains of Thought

Alex Havrilla

Maia Iyer

289

06 Feb 2024

In-context learning agents are asymmetric belief updaters

182

06 Feb 2024

A phase transition between positional and semantic learning in a solvable model of dot-product attentionNeural Information Processing Systems (NeurIPS), 2024

Lenka Zdeborová

247

06 Feb 2024

Attention with Markov: A Framework for Principled Analysis of Transformers via Markov Chains

Ashok Vardhan Makkuva

392

06 Feb 2024

Attention Meets Post-hoc Interpretability: A Mathematical PerspectiveInternational Conference on Machine Learning (ICML), 2024

Gianluigi Lopardo

F. Precioso

Damien Garreau

249

05 Feb 2024

C-RAG: Certified Generation Risks for Retrieval-Augmented Language ModelsInternational Conference on Machine Learning (ICML), 2024

Nezihe Merve Gürel

460

05 Feb 2024

Is Mamba Capable of In-Context Learning?

Thomas Brox

239

05 Feb 2024

Data Poisoning for In-context Learning

393

03 Feb 2024

Can MLLMs Perform Text-to-Image In-Context Learning?

263

02 Feb 2024

Transformers Learn Nonlinear Features In Context: Nonconvex Mean-field Dynamics on the Attention Landscape

Juno Kim

Taiji Suzuki

367

02 Feb 2024

LLMs learn governing principles of dynamical systems, revealing an in-context neural scaling law

239

01 Feb 2024

Theoretical Understanding of In-Context Learning in Shallow Transformers with Unstructured Data

236

01 Feb 2024

The Information of Large Language Model Geometry

Zhiquan Tan

Chenghai Li

Weiran Huang

224

01 Feb 2024

Superiority of Multi-Head Attention in In-Context Linear Regression

205

30 Jan 2024

An Information-Theoretic Analysis of In-Context LearningInternational Conference on Machine Learning (ICML), 2024

357

28 Jan 2024

In-Context Language Learning: Architectures and AlgorithmsInternational Conference on Machine Learning (ICML), 2024

Bailin Wang

388

23 Jan 2024

Enhancing In-context Learning via Linear Probe CalibrationInternational Conference on Artificial Intelligence and Statistics (AISTATS), 2024

Tianyi Chen

242

22 Jan 2024

In-context Learning with Retrieved Demonstrations for Language Models: A Survey

707

21 Jan 2024

Anchor function: a type of benchmark functions for studying language models

340

16 Jan 2024

AI-as-exploration: Navigating intelligence space

Dimitri Coelho Mollo

240

15 Jan 2024

A Survey on Statistical Theory of Deep Learning: Approximation, Training Dynamics, and Generative ModelsAnnual Review of Statistics and Its Application (ARSIA), 2024

Namjoon Suh

Guang Cheng

MedIm

350

14 Jan 2024

Batch-ICL: Effective, Efficient, and Order-Agnostic In-Context LearningAnnual Meeting of the Association for Computational Linguistics (ACL), 2024

Rui Yan

316

12 Jan 2024

Setting the Record Straight on Transformer Oversmoothing

G. Dovonon

M. Bronstein

Matt J. Kusner

406

09 Jan 2024

Robust Stochastically-Descending Unrolled Networks

Samar Hadou

Navid Naderializadeh

Alejandro Ribeiro

324

25 Dec 2023

Emergence of In-Context Reinforcement Learning from Noise Distillation

374

19 Dec 2023

Transformers Implement Functional Gradient Descent to Learn Non-Linear Functions In Context

Xiang Cheng

Yuxin Chen

S. Sra

618

11 Dec 2023

Generalization to New Sequential Decision Making Tasks with In-Context Learning

Sharath Chandra Raparthy

331

06 Dec 2023

SARA-RT: Scaling up Robotics Transformers with Self-Adaptive Robust AttentionIEEE International Conference on Robotics and Automation (ICRA), 2023

Isabel Leal

Krzysztof Choromanski

...

219

04 Dec 2023

The mechanistic basis of data dependence and abrupt learning in an in-context classification taskInternational Conference on Learning Representations (ICLR), 2023

Gautam Reddy

311

03 Dec 2023

Compositional Capabilities of Autoregressive Transformers: A Study on Synthetic, Interpretable TasksInternational Conference on Machine Learning (ICML), 2023

326

21 Nov 2023

Looped Transformers are Better at Learning Learning AlgorithmsInternational Conference on Learning Representations (ICLR), 2023

Liu Yang

Kangwook Lee

Robert D. Nowak

Dimitris Papailiopoulos

441

21 Nov 2023

Rethinking Large Language Models in Mental Health Applications

365

19 Nov 2023

Exploring the Relationship between In-Context Learning and Instruction Tuning

Hanyu Duan

Yixuan Tang

Yi Yang

Ahmed Abbasi

Kar Yan Tam

220

17 Nov 2023

Transformers can optimally learn regression mixture modelsInternational Conference on Learning Representations (ICLR), 2023

196

14 Nov 2023

The Transient Nature of Emergent In-Context Learning in TransformersNeural Information Processing Systems (NeurIPS), 2023

470

14 Nov 2023

In-context Learning and Gradient Descent RevisitedNorth American Chapter of the Association for Computational Linguistics (NAACL), 2023

412

13 Nov 2023

In-context Vectors: Making In Context Learning More Effective and Controllable Through Latent Space SteeringInternational Conference on Machine Learning (ICML), 2023

Sheng Liu

Haotian Ye

Lei Xing

James Y. Zou

250

206

11 Nov 2023

In-Context Exemplars as Clues to Retrieving from Large Associative Memory

Jiachen Zhao

290

06 Nov 2023

On the Convergence of Encoder-only Shallow TransformersNeural Information Processing Systems (NeurIPS), 2023

219

02 Nov 2023

Transformers are Provably Optimal In-context Estimators for Wireless CommunicationsInternational Conference on Artificial Intelligence and Statistics (AISTATS), 2023

Vishnu Teja Kunde

Vicram Rajagopalan

Chandra Shekhara Kaushik Valmeekam

593

01 Nov 2023

The Expressibility of Polynomial based Attention Scheme

Zhao Song

Guangyi Xu

Junze Yin

313

30 Oct 2023