Papers
Communities
Organizations
Events
Blog
Pricing
Feedback
Contact Sales
Search
Open menu
Home
Papers
2507.07101
Cited By
v1
v2 (latest)
Small Batch Size Training for Language Models: When Vanilla SGD Works, and Why Gradient Accumulation Is Wasteful
9 July 2025
Martin Marek
Sanae Lotfi
Aditya Somasundaram
A. Wilson
Micah Goldblum
LRM
Re-assign community
ArXiv (abs)
PDF
HTML
HuggingFace (2 upvotes)
Papers citing
"Small Batch Size Training for Language Models: When Vanilla SGD Works, and Why Gradient Accumulation Is Wasteful"
1 / 1 papers shown
Title
Pre-training under infinite compute
Konwoo Kim
Suhas Kotha
Percy Liang
Tatsunori Hashimoto
0
0
0
18 Sep 2025
1