v1v2 (latest)

Transformers learn in-context by gradient descent

International Conference on Machine Learning (ICML), 2022

15 December 2022

ArXiv (abs)PDF HTML HuggingFace (1 upvotes)Github (361★)

Papers citing "Transformers learn in-context by gradient descent"

50 / 453 papers shown

Title
Dissecting In-Context Learning of Translations in GPTs Vikas Raunak Hany Awadalla Arul Menezes 145 3 0 24 Oct 2023
Function Vectors in Large Language ModelsInternational Conference on Learning Representations (ICLR), 2023 Eric Todd Millicent Li Arnab Sen Sharma Aaron Mueller Byron C. Wallace David Bau 219 173 0 23 Oct 2023
Learning to (Learn at Test Time) Yu Sun Xinhao Li Karan Dalal Chloe Hsu Oluwasanmi Koyejo Carlos Guestrin Xiaolong Wang Tatsunori Hashimoto Xinlei Chen SSL 245 10 0 20 Oct 2023
On the Optimization and Generalization of Multi-head Attention Puneesh Deora Rouzbeh Ghaderi Hossein Taheri Christos Thrampoulidis MLT 228 41 0 19 Oct 2023
Large Language Model for Multi-objective Evolutionary OptimizationInternational Conference on Evolutionary Multi-Criterion Optimization (EMO), 2023 Fei Liu Xi Lin Zhenkun Wang Shunyu Yao Xialiang Tong Mingxuan Yuan Qingfu Zhang 249 52 0 19 Oct 2023
How Do Transformers Learn In-Context Beyond Simple Functions? A Case Study on Learning with RepresentationsInternational Conference on Learning Representations (ICLR), 2023 Tianyu Guo Wei Hu Song Mei Huan Wang Caiming Xiong Silvio Savarese Yu Bai 179 74 0 16 Oct 2023
Generative Calibration for In-context LearningConference on Empirical Methods in Natural Language Processing (EMNLP), 2023 Zhongtao Jiang Yuanzhe Zhang Cao Liu Jun Zhao Kang Liu 342 21 0 16 Oct 2023
Transformers as Decision Makers: Provable In-Context Reinforcement Learning via Supervised PretrainingInternational Conference on Learning Representations (ICLR), 2023 Licong Lin Yu Bai Song Mei OffRL 265 66 0 12 Oct 2023
Do pretrained Transformers Learn In-Context by Gradient Descent? Lingfeng Shen Aayush Mishra Daniel Khashabi 281 10 0 12 Oct 2023
How Many Pretraining Tasks Are Needed for In-Context Learning of Linear Regression?International Conference on Learning Representations (ICLR), 2023 Jingfeng Wu Difan Zou Zixiang Chen Vladimir Braverman Quanquan Gu Peter L. Bartlett 329 83 0 12 Oct 2023
Is attention required for ICL? Exploring the Relationship Between Model Architecture and In-Context Learning AbilityInternational Conference on Learning Representations (ICLR), 2023 Ivan Lee Nan Jiang Taylor Berg-Kirkpatrick 365 15 0 12 Oct 2023
In-Context Unlearning: Language Models as Few Shot UnlearnersInternational Conference on Machine Learning (ICML), 2023 Martin Pawelczyk Seth Neel Himabindu Lakkaraju MU 411 178 0 11 Oct 2023
Jailbreak and Guard Aligned Language Models with Only Few In-Context Demonstrations Zeming Wei Yifei Wang Ang Li Yichuan Mo Yisen Wang 260 383 0 10 Oct 2023
A Meta-Learning Perspective on Transformers for Causal Language ModelingAnnual Meeting of the Association for Computational Linguistics (ACL), 2023 Xinbo Wu Lav Varshney 233 8 0 09 Oct 2023
In-Context Convergence of TransformersInternational Conference on Machine Learning (ICML), 2023 Yu Huang Yuan Cheng Yingbin Liang MLT 245 94 0 08 Oct 2023
The Cost of Down-Scaling Language Models: Fact Recall Deteriorates before In-Context Learning Tian Jin Nolan Clement Xin Dong Vaishnavh Nagarajan Michael Carbin Jonathan Ragan-Kelley Gintare Karolina Dziugaite LRM 255 5 0 07 Oct 2023
Fine-tune Language Models to Approximate Unbiased In-context Learning Timothy Chu Zhao Song Chiwun Yang 220 17 0 05 Oct 2023
Understanding In-Context Learning in Transformers and LLMs by Learning to Learn Discrete FunctionsInternational Conference on Learning Representations (ICLR), 2023 S. Bhattamishra Arkil Patel Phil Blunsom Varun Kanade 285 70 0 04 Oct 2023
Linear attention is (maybe) all you need (to understand transformer optimization)International Conference on Learning Representations (ICLR), 2023 Kwangjun Ahn Xiang Cheng Minhak Song Chulhee Yun Ali Jadbabaie S. Sra 315 69 1 02 Oct 2023
JoMA: Demystifying Multilayer Transformers via JOint Dynamics of MLP and AttentionInternational Conference on Learning Representations (ICLR), 2023 Yuandong Tian Yiping Wang Zhenyu Zhang Beidi Chen Simon Shaolei Du 266 45 0 01 Oct 2023
Decoding In-Context Learning: Neuroscience-inspired Analysis of Representations in Large Language Models Safoora Yousefi Leo Betthauser Hosein Hasanbeig Raphael Milliere Ida Momennejad 400 7 0 30 Sep 2023
Understanding In-Context Learning from RepetitionsInternational Conference on Learning Representations (ICLR), 2023 Jianhao Yan Jin Xu Chiyu Song Chenming Wu Yafu Li Yue Zhang 312 27 0 30 Sep 2023
Reason for Future, Act for Now: A Principled Framework for Autonomous LLM Agents with Provable Sample Efficiency Zhihan Liu Hao Hu Shenao Zhang Hongyi Guo Shuqi Ke Boyi Liu Zhaoran Wang LLMAG LRM 384 44 0 29 Sep 2023
A Benchmark for Learning to Translate a New Language from One Grammar BookInternational Conference on Learning Representations (ICLR), 2023 Garrett Tanzer Mirac Suzgun Chenguang Xi Dan Jurafsky Luke Melas-Kyriazi 230 76 0 28 Sep 2023
Understanding Catastrophic Forgetting in Language Models via Implicit InferenceInternational Conference on Learning Representations (ICLR), 2023 Suhas Kotha Jacob Mitchell Springer Aditi Raghunathan CLL 367 100 0 18 Sep 2023
Context is EnvironmentInternational Conference on Learning Representations (ICLR), 2023 Sharut Gupta Stefanie Jegelka David Lopez-Paz Kartik Ahuja 171 0 0 18 Sep 2023
Breaking through the learning plateaus of in-context learning in TransformerInternational Conference on Machine Learning (ICML), 2023 Jingwen Fu Tao Yang Yuwang Wang Yan Lu Nanning Zheng 256 4 0 12 Sep 2023
Uncovering mesa-optimization algorithms in Transformers J. Oswald Eyvind Niklasson Maximilian Schlegel Seijin Kobayashi Nicolas Zucchet ... Mark Sandler Blaise Agüera y Arcas Max Vladymyrov Razvan Pascanu João Sacramento 174 79 0 11 Sep 2023
An Appraisal-Based Chain-Of-Emotion Architecture for Affective Language Model Game AgentsPLoS ONE (PLoS ONE), 2023 Maximilian Croissant Madeleine Frister Guy Schofield Cade McCall LLMAG 170 19 0 10 Sep 2023
Are Emergent Abilities in Large Language Models just In-Context Learning?Annual Meeting of the Association for Computational Linguistics (ACL), 2023 Sheng Lu Irina Bigoulaeva Rachneet Sachdeva Harish Tayyar Madabushi Iryna Gurevych LRM ELM ReLM 343 128 0 04 Sep 2023
Gated recurrent neural networks discover attention Nicolas Zucchet Seijin Kobayashi Yassir Akram J. Oswald Maxime Larcher Angelika Steger João Sacramento 187 9 0 04 Sep 2023
Adversarial Fine-Tuning of Language Models: An Iterative Optimisation Approach for the Generation and Detection of Problematic Content Charles OÑeill Jack Miller I. Ciucă Y. Ting 丁 Thang Bui 123 9 0 26 Aug 2023
Causal Intersectionality and Dual Form of Gradient Descent for Multimodal Analysis: a Case Study on Hateful MemesInternational Conference on Language Resources and Evaluation (LREC), 2023 Yosuke Miyanishi Minh Le Nguyen 297 2 0 19 Aug 2023
Inductive-bias Learning: Generating Code Models with Large Language Model Toma Tanaka Naofumi Emoto Tsukasa Yumibayashi AI4CE 134 0 0 19 Aug 2023
CausalLM is not optimal for in-context learningInternational Conference on Learning Representations (ICLR), 2023 Nan Ding Tomer Levinboim Jialin Wu Sebastian Goodman Radu Soricut 148 30 0 14 Aug 2023
In-Context Learning Learns Label Relationships but Is Not Conventional LearningInternational Conference on Learning Representations (ICLR), 2023 Jannik Kossen Y. Gal Tom Rainforth 490 51 0 23 Jul 2023
What can a Single Attention Layer Learn? A Study Through the Random Features LensNeural Information Processing Systems (NeurIPS), 2023 Hengyu Fu Tianyu Guo Yu Bai Song Mei MLT 169 34 0 21 Jul 2023
SINC: Self-Supervised In-Context Learning for Vision-Language TasksIEEE International Conference on Computer Vision (ICCV), 2023 Yi-Syuan Chen Yun-Zhu Song Cheng Yu Yeo Bei Liu Jianlong Fu Hong-Han Shuai VLM LRM 195 7 0 15 Jul 2023
Large Language Models as General Pattern MachinesConference on Robot Learning (CoRL), 2023 Suvir Mirchandani F. Xia Peter R. Florence Brian Ichter Danny Driess Montse Gonzalez Arenas Kanishka Rao Dorsa Sadigh Andy Zeng LLMAG 256 251 0 10 Jul 2023
Graph Neural Networks as an Enabler of Terahertz-based Flow-guided Nanoscale Localization over Highly Erroneous Raw DataIEEE Journal on Selected Areas in Communications (JSAC), 2023 Gerard Calvo Bartra Filip Lemic Guillem Pascual S. Abadal Jakob Struye Carmen Delgado Xavier Costa Pérez 165 3 0 09 Jul 2023
Bidirectional Attention as a Mixture of Continuous Word ExpertsConference on Uncertainty in Artificial Intelligence (UAI), 2023 Kevin Christian Wibisono Yixin Wang MoE 91 0 0 08 Jul 2023
One Step of Gradient Descent is Provably the Optimal In-Context Learner with One Layer of Linear Self-AttentionInternational Conference on Learning Representations (ICLR), 2023 Arvind V. Mahankali Tatsunori B. Hashimoto Tengyu Ma MLT 140 139 0 07 Jul 2023
Scaling In-Context Demonstrations with Structured Attention Tianle Cai Kaixuan Huang Jason D. Lee Mengdi Wang LRM 142 9 0 05 Jul 2023
Trainable Transformer in TransformerInternational Conference on Machine Learning (ICML), 2023 A. Panigrahi Sadhika Malladi Mengzhou Xia Sanjeev Arora VLM 292 13 0 03 Jul 2023
Understanding In-Context Learning via Supportive Pretraining DataAnnual Meeting of the Association for Computational Linguistics (ACL), 2023 Xiaochuang Han Daniel Simig Todor Mihaylov Yulia Tsvetkov Asli Celikyilmaz Tianlu Wang AIMat 206 46 0 26 Jun 2023
Pretraining task diversity and the emergence of non-Bayesian in-context learning for regressionNeural Information Processing Systems (NeurIPS), 2023 Allan Raventós Mansheej Paul F. Chen Surya Ganguli 243 122 0 26 Jun 2023
Supervised Pretraining Can Learn In-Context Reinforcement LearningNeural Information Processing Systems (NeurIPS), 2023 Jonathan Lee Annie Xie Aldo Pacchiano Yash Chandak Chelsea Finn Ofir Nachum Emma Brunskill OffRL 292 116 0 26 Jun 2023
SynerGPT: In-Context Learning for Personalized Drug Synergy Prediction and Drug DesignbioRxiv (bioRxiv), 2023 Carl Edwards Aakanksha Naik Tushar Khot Martin D. Burke Heng Ji Kyle Lo 299 21 0 19 Jun 2023
Trained Transformers Learn Linear Models In-ContextJournal of machine learning research (JMLR), 2023 Ruiqi Zhang Spencer Frei Peter L. Bartlett 333 270 0 16 Jun 2023
TART: A plug-and-play Transformer module for task-agnostic reasoningNeural Information Processing Systems (NeurIPS), 2023 Kush S. Bhatia A. Narayan Chris De Sa Christopher Ré LRM ReLM VLM 127 16 0 13 Jun 2023