
Title |
|---|
![]() A Multi-Power Law for Loss Curve Prediction Across Learning Rate SchedulesInternational Conference on Learning Representations (ICLR), 2025 |
![]() Scaling Optimal LR Across Token HorizonsInternational Conference on Learning Representations (ICLR), 2024 |
![]() Critical Influence of Overparameterization on Sharpness-aware MinimizationConference on Uncertainty in Artificial Intelligence (UAI), 2023 |
![]() A Simple and Effective Pruning Approach for Large Language ModelsInternational Conference on Learning Representations (ICLR), 2023 |