238
v1v2 (latest)

Training Optimal Large Diffusion Language Models

Main:21 Pages
21 Figures
Bibliography:4 Pages
5 Tables
Appendix:9 Pages
Abstract

We introduce Quokka, the first systematic scaling law for diffusion language models (DLMs), encompassing both compute-constrained and data-constrained regimes, and studying the key modeling and optimization designs. Quokka is a good friend of Chinchilla and provides wider scopes. We hope the results would bring short-term practical guidance in DLMs training and long-term inspirations for the whole AI community.

View on arXiv
Comments on this paper