Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2409.19913
Cited By
Scaling Optimal LR Across Token Horizons
30 September 2024
Johan Bjorck
Alon Benhaim
Vishrav Chaudhary
Furu Wei
Xia Song
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Scaling Optimal LR Across Token Horizons"
3 / 3 papers shown
Title
Phi-4-Mini Technical Report: Compact yet Powerful Multimodal Language Models via Mixture-of-LoRAs
Abdelrahman Abouelenin
Atabak Ashfaq
Adam Atkinson
Hany Awadalla
Nguyen Bach
...
Ishmam Zabir
Yunan Zhang
Li Zhang
Y. Zhang
Xiren Zhou
MoE
SyDa
68
18
0
03 Mar 2025
Straight to Zero: Why Linearly Decaying the Learning Rate to Zero Works Best for LLMs
Shane Bergsma
Nolan Dey
Gurpreet Gosal
Gavia Gray
Daria Soboleva
Joel Hestness
42
5
0
21 Feb 2025
How to set AdamW's weight decay as you scale model and dataset size
Xi Wang
Laurence Aitchison
30
9
0
22 May 2024
1