Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2410.05838
Cited By
Time Transfer: On Optimal Learning Rate and Batch Size In The Infinite Data Limit
10 January 2025
Oleg Filatov
Jan Ebert
Jiangtao Wang
Stefan Kesselheim
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Time Transfer: On Optimal Learning Rate and Batch Size In The Infinite Data Limit"
2 / 2 papers shown
Title
Straight to Zero: Why Linearly Decaying the Learning Rate to Zero Works Best for LLMs
Shane Bergsma
Nolan Dey
Gurpreet Gosal
Gavia Gray
Daria Soboleva
Joel Hestness
42
5
0
21 Feb 2025
How Does Critical Batch Size Scale in Pre-training?
Hanlin Zhang
Depen Morwani
Nikhil Vyas
Jingfeng Wu
Difan Zou
Udaya Ghai
Dean Phillips Foster
Sham Kakade
51
8
0
29 Oct 2024
1