Time Transfer: On Optimal Learning Rate and Batch Size In The Infinite Data Limit

10 January 2025

Papers citing "Time Transfer: On Optimal Learning Rate and Batch Size In The Infinite Data Limit"

2 / 2 papers shown

Title
Straight to Zero: Why Linearly Decaying the Learning Rate to Zero Works Best for LLMs Shane Bergsma Nolan Dey Gurpreet Gosal Gavia Gray Daria Soboleva Joel Hestness 44 5 0 21 Feb 2025
How Does Critical Batch Size Scale in Pre-training? Hanlin Zhang Depen Morwani Nikhil Vyas Jingfeng Wu Difan Zou Udaya Ghai Dean Phillips Foster Sham Kakade 51 8 0 29 Oct 2024