ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2409.19913
  4. Cited By
Scaling Optimal LR Across Token Horizons

Scaling Optimal LR Across Token Horizons

30 September 2024
Johan Bjorck
Alon Benhaim
Vishrav Chaudhary
Furu Wei
Xia Song
ArXivPDFHTML

Papers citing "Scaling Optimal LR Across Token Horizons"

3 / 3 papers shown
Title
Phi-4-Mini Technical Report: Compact yet Powerful Multimodal Language Models via Mixture-of-LoRAs
Abdelrahman Abouelenin
Atabak Ashfaq
Adam Atkinson
Hany Awadalla
Nguyen Bach
...
Ishmam Zabir
Yunan Zhang
Li Zhang
Y. Zhang
Xiren Zhou
MoE
SyDa
68
18
0
03 Mar 2025
Straight to Zero: Why Linearly Decaying the Learning Rate to Zero Works Best for LLMs
Straight to Zero: Why Linearly Decaying the Learning Rate to Zero Works Best for LLMs
Shane Bergsma
Nolan Dey
Gurpreet Gosal
Gavia Gray
Daria Soboleva
Joel Hestness
42
5
0
21 Feb 2025
How to set AdamW's weight decay as you scale model and dataset size
How to set AdamW's weight decay as you scale model and dataset size
Xi Wang
Laurence Aitchison
30
9
0
22 May 2024
1