ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2408.11029
  4. Cited By
Scaling Law with Learning Rate Annealing

Scaling Law with Learning Rate Annealing

20 August 2024
Howe Tissue
Venus Wang
Lu Wang
ArXiv (abs)PDFHTMLHuggingFace (4 upvotes)

Papers citing "Scaling Law with Learning Rate Annealing"

10 / 10 papers shown
Title
How Learning Rate Decay Wastes Your Best Data in Curriculum-Based LLM Pretraining
How Learning Rate Decay Wastes Your Best Data in Curriculum-Based LLM Pretraining
Kairong Luo
Zhenbo Sun
Haodong Wen
Xinyu Shi
Jiarui Cui
Chenyi Dang
Kaifeng Lyu
Wenguang Chen
135
1
0
24 Nov 2025
Mid-Training of Large Language Models: A Survey
Mid-Training of Large Language Models: A Survey
Kaixiang Mo
Yuxin Shi
Weiwei Weng
Zhiqiang Zhou
Shuman Liu
Haibo Zhang
Anxiang Zeng
LRM
111
0
0
08 Oct 2025
Training Dynamics Impact Post-Training Quantization Robustness
Training Dynamics Impact Post-Training Quantization Robustness
Albert Catalan-Tatjer
Niccolò Ajroldi
Jonas Geiping
MQ
133
0
0
07 Oct 2025
Predicting Training Re-evaluation Curves Enables Effective Data Curriculums for LLMs
Predicting Training Re-evaluation Curves Enables Effective Data Curriculums for LLMs
Shane Bergsma
Nolan Dey
Joel Hestness
146
0
0
29 Sep 2025
Scaling with Collapse: Efficient and Predictable Training of LLM Families
Scaling with Collapse: Efficient and Predictable Training of LLM Families
Shane Bergsma
Bin Claire Zhang
Nolan Dey
Shaheer Muhammad
Gurpreet Gosal
Joel Hestness
116
2
0
29 Sep 2025
Learning Dynamics in Continual Pre-Training for Large Language Models
Learning Dynamics in Continual Pre-Training for Large Language Models
Xingjin Wang
Howe Tissue
Lu Wang
Linjing Li
D. Zeng
CLL
246
3
0
12 May 2025
A Multi-Power Law for Loss Curve Prediction Across Learning Rate Schedules
A Multi-Power Law for Loss Curve Prediction Across Learning Rate SchedulesInternational Conference on Learning Representations (ICLR), 2025
Kairong Luo
Haodong Wen
Shengding Hu
Zhenbo Sun
Zhiyuan Liu
Maosong Sun
Kaifeng Lyu
Wenguang Chen
CLL
227
11
0
17 Mar 2025
Straight to Zero: Why Linearly Decaying the Learning Rate to Zero Works Best for LLMs
Straight to Zero: Why Linearly Decaying the Learning Rate to Zero Works Best for LLMsInternational Conference on Learning Representations (ICLR), 2025
Shane Bergsma
Nolan Dey
Gurpreet Gosal
Gavia Gray
Daria Soboleva
Joel Hestness
277
19
0
21 Feb 2025
Scaling Laws for Predicting Downstream Performance in LLMs
Scaling Laws for Predicting Downstream Performance in LLMs
Yangyi Chen
Binxuan Huang
Yifan Gao
Zhengyang Wang
Jingfeng Yang
Heng Ji
LRM
312
25
0
11 Oct 2024
Understanding Emergent Abilities of Language Models from the Loss Perspective
Understanding Emergent Abilities of Language Models from the Loss PerspectiveNeural Information Processing Systems (NeurIPS), 2024
Zhengxiao Du
Aohan Zeng
Yuxiao Dong
Jie Tang
UQCVLRM
347
76
0
23 Mar 2024
1