
Title |
|---|
![]() A Multi-Power Law for Loss Curve Prediction Across Learning Rate SchedulesInternational Conference on Learning Representations (ICLR), 2025 |
![]() Straight to Zero: Why Linearly Decaying the Learning Rate to Zero Works Best for LLMsInternational Conference on Learning Representations (ICLR), 2025 |
![]() Understanding Emergent Abilities of Language Models from the Loss PerspectiveNeural Information Processing Systems (NeurIPS), 2024 |