Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
1904.00962
Cited By
v1
v2
v3
v4
v5 (latest)
Large Batch Optimization for Deep Learning: Training BERT in 76 minutes
1 April 2019
Yang You
Jing Li
Sashank J. Reddi
Jonathan Hseu
Sanjiv Kumar
Srinadh Bhojanapalli
Xiaodan Song
J. Demmel
Kurt Keutzer
Cho-Jui Hsieh
ODL
Re-assign community
ArXiv (abs)
PDF
HTML
Github (1698★)
Papers citing
"Large Batch Optimization for Deep Learning: Training BERT in 76 minutes"
50 / 647 papers shown
Controlling changes to attention logits
Ben Anson
Laurence Aitchison
228
0
0
26 Nov 2025
Advancing Image Classification with Discrete Diffusion Classification Modeling
Omer Belhasin
Shelly Golan
Ran El-Yaniv
Michael Elad
DiffM
285
0
0
25 Nov 2025
A Circular Argument : Does RoPE need to be Equivariant for Vision?
Chase van de Geijn
Timo Lüddecke
Polina Turishcheva
Alexander S. Ecker
235
2
0
11 Nov 2025
Non-Negative Stiefel Approximating Flow: Orthogonalish Matrix Optimization for Interpretable Embeddings
Brian B. Avants
Nicholas J. Tustison
J. Stone
133
0
0
09 Nov 2025
Spin-Adapted Neural Network Wavefunctions in Real Space
Ruichen Li
Yuzhi Liu
Du Jiang
Yixiao Chen
Xuelan Wen
Wenrui Li
Di He
Liwei Wang
Ji Chen
Weiluo Ren
132
2
0
03 Nov 2025
AI Progress Should Be Measured by Capability-Per-Resource, Not Scale Alone: A Framework for Gradient-Guided Resource Allocation in LLMs
David McCoy
Yulun Wu
Zachary Butzin-Dozier
153
1
0
02 Nov 2025
Exploring Landscapes for Better Minima along Valleys
Tong Zhao
Jiacheng Li
Yuanchang Zhou
Guangming Tan
Weile Jia
140
2
0
31 Oct 2025
Relative Scaling Laws for LLMs
William B. Held
David Leo Wright Hall
Abigail Z. Jacobs
Diyi Yang
236
4
0
28 Oct 2025
On Optimal Hyperparameters for Differentially Private Deep Transfer Learning
Aki Rehn
Linzh Zhao
Mikko Heikkilä
Antti Honkela
OT
169
0
0
23 Oct 2025
A Scalable, Causal, and Energy Efficient Framework for Neural Decoding with Spiking Neural Networks
Georgios Mentzelopoulos
Ioannis Asmanis
Konrad Paul Kording
Eva L. Dyer
Kostas Daniilidis
Flavia Vitale
198
0
0
23 Oct 2025
HyperDiffusionFields (HyDiF): Diffusion-Guided Hypernetworks for Learning Implicit Molecular Neural Fields
Sudarshan Babu
Phillip Lo
Xiao Zhang
Aadi Srivastava
Ali Davariashtiyani
Jason Perera
Michael Maire
Aly A. Khan
AI4CE
220
0
0
20 Oct 2025
Seesaw: Accelerating Training by Balancing Learning Rate and Batch Size Scheduling
Alexandru Meterez
Depen Morwani
Jingfeng Wu
Costin-Andrei Oncescu
Cengiz Pehlevan
Sham Kakade
LRM
198
3
0
16 Oct 2025
Noise-Adaptive Layerwise Learning Rates: Accelerating Geometry-Aware Optimization for Deep Neural Network Training
Jie Hao
Xiaochuan Gong
Jie Xu
Z. Wang
Mingrui Liu
AI4CE
207
1
0
15 Oct 2025
DYNAMIX: RL-based Adaptive Batch Size Optimization in Distributed Machine Learning Systems
Yuanjun Dai
Keqiang He
An Wang
162
0
0
09 Oct 2025
Adaptive Memory Momentum via a Model-Based Framework for Deep Learning Optimization
Kristi Topollai
A. Choromańska
ODL
419
1
0
06 Oct 2025
Integrating Offline Pre-Training with Online Fine-Tuning: A Reinforcement Learning Approach for Robot Social Navigation
Run Su
Hao Fu
Shuai Zhou
Yingao Fu
OffRL
OnRL
274
0
0
01 Oct 2025
Conda: Column-Normalized Adam for Training Large Language Models Faster
Junjie Wang
Pan Zhou
Yiming Dong
Huan Li
Jia Li
Xun Zhou
Qicheng Lao
Cong Fang
Zhouchen Lin
AI4CE
297
2
0
29 Sep 2025
Data-Efficient Training by Evolved Sampling
Ziheng Cheng
Zhong Li
Jiang Bian
190
1
0
27 Sep 2025
Development of Deep Learning Optimizers: Approaches, Concepts, and Update Rules
Doğay Altınel
202
1
0
22 Sep 2025
Profiling LoRA/QLoRA Fine-Tuning Efficiency on Consumer GPUs: An RTX 4060 Case Study
MSR Avinash
100
0
0
07 Sep 2025
On Using Large-Batches in Federated Learning
Sahil Tyagi
FedML
149
0
0
05 Sep 2025
MERIT: Maximum-normalized Element-wise Ratio for Language Model Large-batch Training
Yang Luo
Zangwei Zheng
Ziheng Qin
Zirui Zhu
Yong Liu
Yang You
ALM
198
0
0
28 Aug 2025
When Routers, Switches and Interconnects Compute: A processing-in-interconnect Paradigm for Scalable Neuromorphic AI
Madhuvanthi Srivatsav R
Chiranjib Bhattacharyya
S. Chakrabartty
Chetan Singh Thakur
171
0
0
27 Aug 2025
Tri-Accel: Curvature-Aware Precision-Adaptive and Memory-Elastic Optimization for Efficient GPU Usage
Mohsen Sheibanian
Pouya Shaeri
Alimohammad Beigi
Ryan T. Woo
Aryan Keluskar
265
0
0
23 Aug 2025
Towards Reliable and Generalizable Differentially Private Machine Learning (Extended Version)
Wenxuan Bao
Vincent Bindschaedler
AAML
318
0
0
21 Aug 2025
MAESTRO: Masked AutoEncoders for Multimodal, Multitemporal, and Multispectral Earth Observation Data
Antoine Labatie
Michael Vaccaro
Nina Lardiere
A. Garioud
Nicolas Gonthier
319
1
0
14 Aug 2025
Learning to See and Act: Task-Aware Virtual View Exploration for Robotic Manipulation
Yongjie Bai
Zhouxia Wang
Wenshu Fan
Weixing Chen
Ziliang Chen
...
Yongsen Zheng
Lingbo Liu
Guanbin Li
Guanbin Li
Liang Lin
499
1
0
07 Aug 2025
Slice or the Whole Pie? Utility Control for AI Models
Ye Tao
AAML
136
0
0
06 Aug 2025
Fishers for Free? Approximating the Fisher Information Matrix by Recycling the Squared Gradient Accumulator
YuXin Li
Felix Dangel
Derek Tam
Colin Raffel
326
6
0
24 Jul 2025
Small Batch Size Training for Language Models: When Vanilla SGD Works, and Why Gradient Accumulation Is Wasteful
Martin Marek
Sanae Lotfi
Aditya Somasundaram
A. Wilson
Micah Goldblum
LRM
499
28
0
09 Jul 2025
Both Asymptotic and Non-Asymptotic Convergence of Quasi-Hyperbolic Momentum using Increasing Batch Size
Kento Imaizumi
Hideaki Iiduka
277
0
0
30 Jun 2025
An Adaptive Method Stabilizing Activations for Enhanced Generalization
Hyunseok Seung
Jaewoo Lee
Hyunsuk Ko
ODL
358
0
0
10 Jun 2025
Investigating Mask-aware Prototype Learning for Tabular Anomaly Detection
Ruiying Lu
Jinhan Liu
Chuan Du
D. Guo
OOD
AAML
241
0
0
03 Jun 2025
Taming LLMs by Scaling Learning Rates with Gradient Grouping
Siyuan Li
Juanxi Tian
Zedong Wang
Xin Jin
Zicheng Liu
Wentao Zhang
Dan Xu
284
0
0
01 Jun 2025
SUMO: Subspace-Aware Moment-Orthogonalization for Accelerating Memory-Efficient LLM Training
Yehonathan Refael
Guy Smorodinsky
Tom Tirer
Ofir Lindenbaum
249
11
0
30 May 2025
On the Convergence Analysis of Muon
Wei Shen
Ruichuan Huang
Minhui Huang
Cong Shen
Jiawei Zhang
455
0
0
29 May 2025
DES-LOC: Desynced Low Communication Adaptive Optimizers for Training Foundation Models
Alex Iacob
Lorenzo Sani
M. Safaryan
Paris Giampouras
Samuel Horváth
...
Meghdad Kurmanji
Preslav Aleksandrov
William F. Shen
Xinchi Qiu
Nicholas D. Lane
OffRL
508
2
0
28 May 2025
Deep Learning-Based Forecasting of Boarding Patient Counts to Address ED Overcrowding
Orhun Vural
Bunyamin Ozaydin
Khalid Y. Aram
James Booth
Brittany F. Lindsey
236
0
0
20 May 2025
A Physics-Inspired Optimizer: Velocity Regularized Adam
Pranav Vaidhyanathan
Lucas Schorling
Natalia Ares
Michael A. Osborne
ODL
547
0
0
19 May 2025
Power Lines: Scaling Laws for Weight Decay and Batch Size in LLM Pre-training
Shane Bergsma
Nolan Dey
Gurpreet Gosal
Gavia Gray
Daria Soboleva
Joel Hestness
506
22
0
19 May 2025
On the
O
(
d
K
1
/
4
)
O(\frac{\sqrt{d}}{K^{1/4}})
O
(
K
1/4
d
)
Convergence Rate of AdamW Measured by
ℓ
1
\ell_1
ℓ
1
Norm
Huan Li
Yiming Dong
Zhouchen Lin
564
0
0
17 May 2025
Pretraining Large Brain Language Model for Active BCI: Silent Speech
Jinzhao Zhou
Zehong Cao
Yiqun Duan
Connor Barkley
Daniel Leong
...
Ziyi Zhao
T. Do
Yu-Cheng Chang
Sheng-Fu Liang
Chin-Teng Lin
562
7
0
29 Apr 2025
AlphaGrad: Non-Linear Gradient Normalization Optimizer
Soham Sane
ODL
442
0
0
22 Apr 2025
Learning Energy-Based Generative Models via Potential Flow: A Variational Principle Approach to Probability Density Homotopy Matching
Junn Yong Loo
Michelle Adeline
Julia Kaiwen Lau
Fang Yu Leong
Hwa Hui Tew
Arghya Pal
Vishnu Monn Baskaran
Chee-Ming Ting
Raphaël C.-W. Phan
BDL
361
2
0
22 Apr 2025
Perception Encoder: The best visual embeddings are not at the output of the network
Daniel Bolya
Po-Yao (Bernie) Huang
Peize Sun
Jang Hyun Cho
Andrea Madotto
...
Shiyu Dong
Nikhila Ravi
Daniel Li
Piotr Dollár
Christoph Feichtenhofer
ObjD
VOS
830
197
0
17 Apr 2025
Lumos: Efficient Performance Modeling and Estimation for Large-scale LLM Training
Mingyu Liang
Hiwot Tadese Kassa
Wenyin Fu
Brian Coutinho
Louis Feng
Christina Delimitrou
423
4
0
12 Apr 2025
Low-Bit Integerization of Vision Transformers using Operand Reordering for Efficient Hardware
Ching-Yi Lin
Sahil Shah
MQ
342
1
0
11 Apr 2025
Neural Encoding and Decoding at Scale
Yizi Zhang
Yanchen Wang
Mehdi Azabou
Alexandre Andre
Zixuan Wang
Hanrui Lyu
International Brain Laboratory
Eva L. Dyer
Liam Paninski
Cole Hurwitz
AI4CE
635
18
0
11 Apr 2025
The Efficacy of Semantics-Preserving Transformations in Self-Supervised Learning for Medical Ultrasound
Blake Vanberlo
Alexander Wong
Jesse Hoey
R. Arntfield
297
0
0
10 Apr 2025
MultiNeRF: Multiple Watermark Embedding for Neural Radiance Fields
Yash Kulthe
Andrew Gilbert
John Collomosse
383
0
0
03 Apr 2025
1
2
3
4
...
11
12
13
Next
Page 1 of 13
Page
of 13
Go