v1v2 (latest)
Training Optimal Large Diffusion Language Models
- ALMAI4CE

Main:21 Pages
21 Figures
Bibliography:4 Pages
5 Tables
Appendix:9 Pages
Abstract
We introduce Quokka, the first systematic scaling law for diffusion language models (DLMs), encompassing both compute-constrained and data-constrained regimes, and studying the key modeling and optimization designs. Quokka is a good friend of Chinchilla and provides wider scopes. We hope the results would bring short-term practical guidance in DLMs training and long-term inspirations for the whole AI community.
View on arXivComments on this paper
