Scaling Law with Learning Rate Annealing

Scaling Law with Learning Rate Annealing

20 August 2024

ArXiv (abs)PDF HTML HuggingFace (4 upvotes)

Papers citing "Scaling Law with Learning Rate Annealing"

10 / 10 papers shown

Title
How Learning Rate Decay Wastes Your Best Data in Curriculum-Based LLM Pretraining Kairong Luo Zhenbo Sun Haodong Wen Xinyu Shi Jiarui Cui Chenyi Dang Kaifeng Lyu Wenguang Chen 135 1 0 24 Nov 2025
Mid-Training of Large Language Models: A Survey Kaixiang Mo Yuxin Shi Weiwei Weng Zhiqiang Zhou Shuman Liu Haibo Zhang Anxiang Zeng LRM 111 0 0 08 Oct 2025
Training Dynamics Impact Post-Training Quantization Robustness Albert Catalan-Tatjer Niccolò Ajroldi Jonas Geiping MQ 133 0 0 07 Oct 2025
Predicting Training Re-evaluation Curves Enables Effective Data Curriculums for LLMs Shane Bergsma Nolan Dey Joel Hestness 146 0 0 29 Sep 2025
Scaling with Collapse: Efficient and Predictable Training of LLM Families Shane Bergsma Bin Claire Zhang Nolan Dey Shaheer Muhammad Gurpreet Gosal Joel Hestness 116 2 0 29 Sep 2025
Learning Dynamics in Continual Pre-Training for Large Language Models Xingjin Wang Howe Tissue Lu Wang Linjing Li D. Zeng CLL 246 3 0 12 May 2025
A Multi-Power Law for Loss Curve Prediction Across Learning Rate SchedulesInternational Conference on Learning Representations (ICLR), 2025 Kairong Luo Haodong Wen Shengding Hu Zhenbo Sun Zhiyuan Liu Maosong Sun Kaifeng Lyu Wenguang Chen CLL 227 11 0 17 Mar 2025
Straight to Zero: Why Linearly Decaying the Learning Rate to Zero Works Best for LLMsInternational Conference on Learning Representations (ICLR), 2025 Shane Bergsma Nolan Dey Gurpreet Gosal Gavia Gray Daria Soboleva Joel Hestness 277 19 0 21 Feb 2025
Scaling Laws for Predicting Downstream Performance in LLMs Yangyi Chen Binxuan Huang Yifan Gao Zhengyang Wang Jingfeng Yang Heng Ji LRM 312 25 0 11 Oct 2024
Understanding Emergent Abilities of Language Models from the Loss PerspectiveNeural Information Processing Systems (NeurIPS), 2024 Zhengxiao Du Aohan Zeng Yuxiao Dong Jie Tang UQCV LRM 347 76 0 23 Mar 2024