19
v1v2v3 (latest)

LLaDA2.1: Speeding Up Text Diffusion via Token Editing

Tiwei Bie
Maosong Cao
Xiang Cao
Bingsen Chen
Fuyuan Chen
Kun Chen
Lun Du
Daozhuo Feng
Haibo Feng
Mingliang Gong
Zhuocheng Gong
Yanmei Gu
Jian Guan
Kaiyuan Guan
Hongliang He
Zenan Huang
Juyong Jiang
Zhonghui Jiang
Zhenzhong Lan
Chengxi Li
Jianguo Li
Zehuan Li
Huabin Liu
Lin Liu
Guoshan Lu
Yuan Lu
Yuxin Ma
Xingyu Mou
Zhenxuan Pan
Kaida Qiu
Yuji Ren
Jianfeng Tan
Yiding Tian
Zian Wang
Lanning Wei
Tao Wu
Yipeng Xing
Wentao Ye
Liangyu Zha
Tianze Zhang
Xiaolu Zhang
Junbo Zhao
Da Zheng
Hao Zhong
Wanli Zhong
Jun Zhou
Junlin Zhou
Liwang Zhu
Muzhi Zhu
Yihong Zhuang
Main:8 Pages
4 Figures
Bibliography:3 Pages
4 Tables
Abstract

While LLaDA2.0 showcased the scaling potential of 100B-level block-diffusion models and their inherent parallelization, the delicate equilibrium between decoding speed and generation quality has remained an elusive frontier. Today, we unveil LLaDA2.1, a paradigm shift designed to transcend this trade-off. By seamlessly weaving Token-to-Token (T2T) editing into the conventional Mask-to-Token (M2T) scheme, we introduce a joint, configurable threshold-decoding scheme. This structural innovation gives rise to two distinct personas: the Speedy Mode (S Mode), which audaciously lowers the M2T threshold to bypass traditional constraints while relying on T2T to refine the output; and the Quality Mode (Q Mode), which leans into conservative thresholds to secure superior benchmark performances with manageable efficiency degrade. Furthering this evolution, underpinned by an expansive context window, we implement the first large-scale Reinforcement Learning (RL) framework specifically tailored for dLLMs, anchored by specialized techniques for stable gradient estimation. This alignment not only sharpens reasoning precision but also elevates instruction-following fidelity, bridging the chasm between diffusion dynamics and complex human intent. We culminate this work by releasing LLaDA2.1-Mini (16B) and LLaDA2.1-Flash (100B). Across 33 rigorous benchmarks, LLaDA2.1 delivers strong task performance and lightning-fast decoding speed. Despite its 100B volume, on coding tasks it attains an astounding 892 TPS on HumanEval+, 801 TPS on BigCodeBench, and 663 TPS on LiveCodeBench.

View on arXiv
Comments on this paper