ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2405.14578
  4. Cited By
Surge Phenomenon in Optimal Learning Rate and Batch Size Scaling

Surge Phenomenon in Optimal Learning Rate and Batch Size Scaling

23 May 2024
Shuaipeng Li
Penghao Zhao
Hailin Zhang
Xingwu Sun
Hao Wu
Dian Jiao
Weiyan Wang
Chengjun Liu
Zheng Fang
Jinbao Xue
Yangyu Tao
Bin Cui
Di Wang
ArXivPDFHTML

Papers citing "Surge Phenomenon in Optimal Learning Rate and Batch Size Scaling"

7 / 7 papers shown
Title
Time Transfer: On Optimal Learning Rate and Batch Size In The Infinite Data Limit
Time Transfer: On Optimal Learning Rate and Batch Size In The Infinite Data Limit
Oleg Filatov
Jan Ebert
Jiangtao Wang
Stefan Kesselheim
36
3
0
10 Jan 2025
Scaling Laws for Floating Point Quantization Training
Scaling Laws for Floating Point Quantization Training
X. Sun
Shuaipeng Li
Ruobing Xie
Weidong Han
Kan Wu
...
Yangyu Tao
Zhanhui Kang
C. Xu
Di Wang
Jie Jiang
MQ
AIFin
58
0
0
05 Jan 2025
Hunyuan-Large: An Open-Source MoE Model with 52 Billion Activated
  Parameters by Tencent
Hunyuan-Large: An Open-Source MoE Model with 52 Billion Activated Parameters by Tencent
X. Sun
Yanfeng Chen
Y. Huang
Ruobing Xie
Jiaqi Zhu
...
Zhanhui Kang
Yong Yang
Yuhong Liu
Di Wang
Jie Jiang
MoE
ALM
ELM
73
25
0
04 Nov 2024
Scaling Laws Across Model Architectures: A Comparative Analysis of Dense
  and MoE Models in Large Language Models
Scaling Laws Across Model Architectures: A Comparative Analysis of Dense and MoE Models in Large Language Models
Siqi Wang
Zhengyu Chen
Bei Li
Keqing He
Min Zhang
Jingang Wang
31
2
0
08 Oct 2024
On Scaling Up 3D Gaussian Splatting Training
On Scaling Up 3D Gaussian Splatting Training
Hexu Zhao
Haoyang Weng
Daohan Lu
Ang Li
Jinyang Li
Aurojit Panda
Saining Xie
3DGS
29
12
0
26 Jun 2024
Scaling Laws for Neural Language Models
Scaling Laws for Neural Language Models
Jared Kaplan
Sam McCandlish
T. Henighan
Tom B. Brown
B. Chess
R. Child
Scott Gray
Alec Radford
Jeff Wu
Dario Amodei
226
4,453
0
23 Jan 2020
On Large-Batch Training for Deep Learning: Generalization Gap and Sharp
  Minima
On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima
N. Keskar
Dheevatsa Mudigere
J. Nocedal
M. Smelyanskiy
P. T. P. Tang
ODL
273
2,886
0
15 Sep 2016
1