A Theory for Emergence of Complex Skills in Language Models

29 July 2023

Papers citing "A Theory for Emergence of Complex Skills in Language Models"

14 / 14 papers shown

Title
Recitation over Reasoning: How Cutting-Edge Language Models Can Fail on Elementary School-Level Reasoning Problems? Kai Yan Yufei Xu Zhengyin Du Xuesong Yao Z. Wang Xiaowen Guo Jiecao Chen ReLM ELM LRM 92 3 0 01 Apr 2025
The Best Instruction-Tuning Data are Those That Fit Dylan Zhang Qirun Dai Hao Peng ALM 115 3 0 06 Feb 2025
Are Transformers Able to Reason by Connecting Separated Knowledge in Training Data? Yutong Yin Zhaoran Wang LRM ReLM 86 0 0 27 Jan 2025
Physics of Skill Learning Ziming Liu Yizhou Liu Eric J. Michaud Jeff Gore Max Tegmark 44 0 0 21 Jan 2025
Out-of-distribution generalization via composition: a lens through induction heads in Transformers Jiajun Song Zhuoyan Xu Yiqiao Zhong 80 4 0 31 Dec 2024
Bypassing the Exponential Dependency: Looped Transformers Efficiently Learn In-context by Multi-step Gradient Descent Bo Chen Xiaoyu Li Yingyu Liang Zhenmei Shi Zhao-quan Song 83 19 0 15 Oct 2024
Can Models Learn Skill Composition from Examples? Haoyu Zhao Simran Kaur Dingli Yu Anirudh Goyal Sanjeev Arora CoGe MoE 53 2 0 29 Sep 2024
State space models, emergence, and ergodicity: How many parameters are needed for stable predictions? Ingvar M. Ziemann Nikolai Matni George J. Pappas 15 1 0 20 Sep 2024
Learning Beyond Pattern Matching? Assaying Mathematical Understanding in LLMs Siyuan Guo Aniket Didolkar Nan Rosemary Ke Anirudh Goyal Ferenc Huszár Bernhard Schölkopf 37 3 0 24 May 2024
Wisdom of the Silicon Crowd: LLM Ensemble Prediction Capabilities Rival Human Crowd Accuracy P. Schoenegger Indre Tuminauskaite Peter S. Park Rafael Valdece Sousa Bastos P. Tetlock 29 24 0 29 Feb 2024
Compositional Capabilities of Autoregressive Transformers: A Study on Synthetic, Interpretable Tasks Rahul Ramesh Ekdeep Singh Lubana Mikail Khona Robert P. Dick Hidenori Tanaka CoGe 27 6 0 21 Nov 2023
The Expressibility of Polynomial based Attention Scheme Zhao-quan Song Guangyi Xu Junze Yin 27 5 0 30 Oct 2023
How to Protect Copyright Data in Optimization of Large Language Models? T. Chu Zhao-quan Song Chiwun Yang 28 29 0 23 Aug 2023
Scaling Laws for Neural Language Models Jared Kaplan Sam McCandlish T. Henighan Tom B. Brown B. Chess R. Child Scott Gray Alec Radford Jeff Wu Dario Amodei 226 4,424 0 23 Jan 2020