Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
2408.09503
Cited By
Out-of-distribution generalization via composition: a lens through induction heads in Transformers
Proceedings of the National Academy of Sciences of the United States of America (PNAS), 2024
31 December 2024
Jiajun Song
Zhuoyan Xu
Yiqiao Zhong
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Out-of-distribution generalization via composition: a lens through induction heads in Transformers"
50 / 99 papers shown
Title
Understanding the Staged Dynamics of Transformers in Learning Latent Structure
Rohan Saha
Farzane Aminmansour
Alona Fyshe
28
0
0
24 Nov 2025
Can Language Models Compose Skills In-Context?
Zidong Liu
Zhuoyan Xu
Zhenmei Shi
Yingyu Liang
ReLM
CoGe
LRM
207
0
0
27 Oct 2025
Can GRPO Help LLMs Transcend Their Pretraining Origin?
Kangqi Ni
Zhen Tan
Zijie Liu
Pingzhi Li
Tianlong Chen
OffRL
LRM
78
0
0
14 Oct 2025
Localizing Task Recognition and Task Learning in In-Context Learning via Attention Head Analysis
Haolin Yang
Hakaze Cho
Naoya Inoue
76
0
0
29 Sep 2025
Task Vectors, Learned Not Extracted: Performance Gains and Mechanistic Insight
Haolin Yang
Hakaze Cho
Kaize Ding
Naoya Inoue
120
0
0
29 Sep 2025
Is Chain-of-Thought Reasoning of LLMs a Mirage? A Data Distribution Lens
Chengshuai Zhao
Zhen Tan
Pingchuan Ma
Dawei Li
Bohan Jiang
Yancheng Wang
Yingzhen Yang
Huan Liu
LRM
249
27
0
02 Aug 2025
Born a Transformer -- Always a Transformer? On the Effect of Pretraining on Architectural Abilities
Yana Veitsman
Mayank Jobanputra
Yash Sarrof
Aleksandra Bakalova
Vera Demberg
Ellie Pavlick
Michael Hahn
390
2
0
27 May 2025
Decomposing Complex Visual Comprehension into Atomic Visual Skills for Vision Language Models
Hyunsik Chae
Seungwoo Yoon
J. Park
Chloe Yewon Chun
Yongin Cho
Mu Cai
Yong Jae Lee
Ernest K. Ryu
CoGe
VLM
242
3
0
26 May 2025
Characterizing Pattern Matching and Its Limits on Compositional Task Structures
Hoyeon Chang
Jinho Park
Hanseul Cho
Sohee Yang
Miyoung Ko
Hyeonbin Hwang
Seungpil Won
Dohaeng Lee
Youbin Ahn
Minjoon Seo
230
1
0
26 May 2025
Unifying Attention Heads and Task Vectors via Hidden State Geometry in In-Context Learning
Haolin Yang
Hakaze Cho
Yiqiao Zhong
Naoya Inoue
125
2
0
24 May 2025
ICL CIPHERS: Quantifying "Learning" in In-Context Learning via Substitution Ciphers
Zhouxiang Fang
Aayush Mishra
Muhan Gao
Anqi Liu
Daniel Khashabi
353
2
0
28 Apr 2025
Learning to Inference Adaptively for Multimodal Large Language Models
Zhuoyan Xu
Khoi Duc Nguyen
Preeti Mukherjee
Saurabh Bagchi
Somali Chaterji
Yingyu Liang
Yin Li
LRM
315
4
0
13 Mar 2025
Can In-context Learning Really Generalize to Out-of-distribution Tasks?
International Conference on Learning Representations (ICLR), 2024
Qixun Wang
Yifei Wang
Yisen Wang
Xianghua Ying
OOD
179
15
0
13 Oct 2024
GSM-Symbolic: Understanding the Limitations of Mathematical Reasoning in Large Language Models
International Conference on Learning Representations (ICLR), 2024
Iman Mirzadeh
Keivan Alizadeh
Hooman Shahrokhi
Oncel Tuzel
Samy Bengio
Mehrdad Farajtabar
AIMat
LRM
433
386
0
07 Oct 2024
Task Diversity Shortens the ICL Plateau
Jaeyeon Kim
Sehyun Kwon
Joo Young Choi
Jongho Park
Jaewoong Cho
Jason D. Lee
Ernest K. Ryu
MoMe
318
3
0
07 Oct 2024
Generalization vs. Specialization under Concept Shift
Alex Nguyen
David J. Schwab
Vudtiwat Ngampruetikorn
OOD
196
0
0
23 Sep 2024
Gemma 2: Improving Open Language Models at a Practical Size
Gemma Team
Gemma Team Morgane Riviere
Shreya Pathak
Pier Giuseppe Sessa
Cassidy Hardin
...
Noah Fiedel
Armand Joulin
Kathleen Kenealy
Robert Dadashi
Alek Andreev
VLM
MoE
OSLM
549
1,485
0
31 Jul 2024
Emergence in non-neural models: grokking modular arithmetic via average gradient outer product
Neil Rohit Mallinar
Daniel Beaglehole
Libin Zhu
Adityanarayanan Radhakrishnan
Parthe Pandit
Misha Belkin
295
14
0
29 Jul 2024
Do Large Language Models Have Compositional Ability? An Investigation into Limitations and Scalability
Zhuoyan Xu
Zhenmei Shi
Yingyu Liang
CoGe
LRM
290
50
0
22 Jul 2024
Scaling and evaluating sparse autoencoders
Leo Gao
Tom Dupré la Tour
Henk Tillman
Gabriel Goh
Rajan Troll
Alec Radford
Ilya Sutskever
Jan Leike
Jeffrey Wu
232
283
0
06 Jun 2024
Why Larger Language Models Do In-context Learning Differently?
Zhenmei Shi
Junyi Wei
Zhuoyan Xu
Yingyu Liang
199
43
0
30 May 2024
Grokked Transformers are Implicit Reasoners: A Mechanistic Journey to the Edge of Generalization
Boshi Wang
Xiang Yue
Yu-Chuan Su
Huan Sun
LRM
284
71
0
23 May 2024
Position: Understanding LLMs Requires More Than Statistical Generalization
International Conference on Machine Learning (ICML), 2024
Patrik Reizinger
Szilvia Ujváry
Anna Mészáros
A. Kerekes
Wieland Brendel
Ferenc Huszár
282
21
0
03 May 2024
Many-Shot In-Context Learning
Rishabh Agarwal
Avi Singh
Lei M. Zhang
Bernd Bohnet
Luis Rosias
...
John D. Co-Reyes
Eric Chu
Feryal M. P. Behbahani
Aleksandra Faust
Hugo Larochelle
ReLM
OffRL
BDL
366
172
0
17 Apr 2024
What needs to go right for an induction head? A mechanistic study of in-context learning circuits and their formation
International Conference on Machine Learning (ICML), 2024
Aaditya K. Singh
Ted Moskovitz
Felix Hill
Stephanie C. Y. Chan
Andrew M. Saxe
AI4CE
238
53
0
10 Apr 2024
What Can Transformer Learn with Varying Depth? Case Studies on Sequence Learning Tasks
International Conference on Machine Learning (ICML), 2024
Xingwu Chen
Difan Zou
ViT
247
20
0
02 Apr 2024
Gemma: Open Models Based on Gemini Research and Technology
Gemma Team
Gemma Team Thomas Mesnard
Cassidy Hardin
Robert Dadashi
Surya Bhupatiraju
...
Armand Joulin
Noah Fiedel
Evan Senter
Alek Andreev
Kathleen Kenealy
VLM
LLMAG
505
805
0
13 Mar 2024
Machine learning and information theory concepts towards an AI Mathematician
Bulletin of the American Mathematical Society (BAMS), 2024
Yoshua Bengio
Nikolay Malkin
161
14
0
07 Mar 2024
Towards Few-Shot Adaptation of Foundation Models via Multitask Finetuning
Zhuoyan Xu
Zhenmei Shi
Junyi Wei
Fangzhou Mu
Yin Li
Yingyu Liang
92
33
0
22 Feb 2024
OLMo: Accelerating the Science of Language Models
Dirk Groeneveld
Iz Beltagy
Pete Walsh
Akshita Bhagia
Rodney Michael Kinney
...
Jesse Dodge
Kyle Lo
Luca Soldaini
Noah A. Smith
Hanna Hajishirzi
OSLM
585
538
0
01 Feb 2024
In-Context Language Learning: Architectures and Algorithms
International Conference on Machine Learning (ICML), 2024
Ekin Akyürek
Bailin Wang
Yoon Kim
Jacob Andreas
LRM
ReLM
273
74
0
23 Jan 2024
Successor Heads: Recurring, Interpretable Attention Heads In The Wild
International Conference on Learning Representations (ICLR), 2023
Rhys Gould
Euan Ong
George Ogden
Arthur Conmy
LRM
214
63
0
14 Dec 2023
The mechanistic basis of data dependence and abrupt learning in an in-context classification task
International Conference on Learning Representations (ICLR), 2023
Gautam Reddy
256
87
0
03 Dec 2023
Dichotomy of Early and Late Phase Implicit Biases Can Provably Induce Grokking
International Conference on Learning Representations (ICLR), 2023
Kaifeng Lyu
Jikai Jin
Zhiyuan Li
Simon S. Du
Jason D. Lee
Wei Hu
AI4CE
265
52
0
30 Nov 2023
The Falcon Series of Open Language Models
Ebtesam Almazrouei
Hamza Alobeidli
Abdulaziz Alshamsi
Alessandro Cappelli
Ruxandra-Aimée Cojocaru
...
Quentin Malartic
Daniele Mazzotta
Badreddine Noune
B. Pannier
Guilherme Penedo
AI4TS
ALM
408
589
0
28 Nov 2023
The Linear Representation Hypothesis and the Geometry of Large Language Models
International Conference on Machine Learning (ICML), 2023
Kiho Park
Yo Joong Choe
Victor Veitch
LLMSV
MILM
409
307
0
07 Nov 2023
What Algorithms can Transformers Learn? A Study in Length Generalization
International Conference on Learning Representations (ICLR), 2023
Hattie Zhou
Arwen Bradley
Etai Littwin
Noam Razin
Omid Saremi
Josh Susskind
Samy Bengio
Preetum Nakkiran
219
157
0
24 Oct 2023
When can transformers reason with abstract symbols?
Enric Boix-Adserà
Omid Saremi
Emmanuel Abbe
Samy Bengio
Etai Littwin
Josh Susskind
LRM
NAI
220
19
0
15 Oct 2023
Circuit Component Reuse Across Tasks in Transformer Language Models
International Conference on Learning Representations (ICLR), 2023
Jack Merullo
Carsten Eickhoff
Ellie Pavlick
282
94
0
12 Oct 2023
Mistral 7B
Albert Q. Jiang
Alexandre Sablayrolles
A. Mensch
Chris Bamford
Devendra Singh Chaplot
...
Teven Le Scao
Thibaut Lavril
Thomas Wang
Timothée Lacroix
William El Sayed
MoE
LRM
338
2,871
0
10 Oct 2023
Uncovering hidden geometry in Transformers via disentangling position and context
Jiajun Song
Yiqiao Zhong
198
12
0
07 Oct 2023
A Theory for Emergence of Complex Skills in Language Models
Sanjeev Arora
Anirudh Goyal
LRM
282
98
0
29 Jul 2023
Overthinking the Truth: Understanding how Language Models Process False Demonstrations
International Conference on Learning Representations (ICLR), 2023
Danny Halawi
Jean-Stanislas Denain
Jacob Steinhardt
247
72
0
18 Jul 2023
Llama 2: Open Foundation and Fine-Tuned Chat Models
Hugo Touvron
Louis Martin
Kevin R. Stone
Peter Albert
Amjad Almahairi
...
Sharan Narang
Aurelien Rodriguez
Robert Stojnic
Sergey Edunov
Thomas Scialom
AI4MH
ALM
5.4K
14,923
0
18 Jul 2023
One Step of Gradient Descent is Provably the Optimal In-Context Learner with One Layer of Linear Self-Attention
International Conference on Learning Representations (ICLR), 2023
Arvind V. Mahankali
Tatsunori B. Hashimoto
Tengyu Ma
MLT
152
139
0
07 Jul 2023
Teaching Arithmetic to Small Transformers
Nayoung Lee
Kartik K. Sreenivasan
Jason D. Lee
Kangwook Lee
Dimitris Papailiopoulos
LRM
235
113
0
07 Jul 2023
Reasoning or Reciting? Exploring the Capabilities and Limitations of Language Models Through Counterfactual Tasks
North American Chapter of the Association for Computational Linguistics (NAACL), 2023
Zhaofeng Wu
Linlu Qiu
Alexis Ross
Ekin Akyürek
Boyuan Chen
Bailin Wang
Najoung Kim
Jacob Andreas
Yoon Kim
LRM
ReLM
358
290
0
05 Jul 2023
The Clock and the Pizza: Two Stories in Mechanistic Explanation of Neural Networks
Neural Information Processing Systems (NeurIPS), 2023
Ziqian Zhong
Ziming Liu
Max Tegmark
Jacob Andreas
238
132
0
30 Jun 2023
Trained Transformers Learn Linear Models In-Context
Journal of machine learning research (JMLR), 2023
Ruiqi Zhang
Spencer Frei
Peter L. Bartlett
341
273
0
16 Jun 2023
Transformers as Statisticians: Provable In-Context Learning with In-Context Algorithm Selection
Neural Information Processing Systems (NeurIPS), 2023
Yu Bai
Fan Chen
Haiquan Wang
Caiming Xiong
Song Mei
211
256
0
07 Jun 2023
1
2
Next