12
v1v2 (latest)

ρρ-EOS\texttt{EOS}: Training-free Bidirectional Variable-Length Control for Masked Diffusion LLMs

Jingyi Yang
Yuxian Jiang
Jing Shao
Main:8 Pages
6 Figures
Bibliography:2 Pages
6 Tables
Appendix:1 Pages
Abstract

Beyond parallel generation and global context modeling, current masked diffusion large language models (masked dLLMs, i.e., LLaDA) suffer from a fundamental limitation: they require a predefined, fixed generation length, which lacks flexibility and forces an inevitable trade-off between output quality and computational efficiency. To address this, we study the denoising dynamics and find that the implicit density (ρ\rho) of end-of-sequence (EOS\texttt{EOS}) tokens serves as a reliable signal of generation sufficiency. In particular, the evolving implicit EOS\texttt{EOS} density during denoising reveals whether the current masked space is excessive or insufficient, thereby guiding the adjustment direction for generation length. Building on this insight, we propose \textbf{\rho-\texttt{EOS}}, a training-free, single-stage strategy that enables bidirectional variable-length generation for masked dLLMs. Unlike prior two-stage approaches--which require separate length adjustment and iterative mask insertion phases while supporting only unidirectional expansion--\textbf{\rho-\texttt{EOS}} achieves bidirectional length adjustment within a unified denoising process by continuously estimating the implicit EOS\texttt{EOS} density: excessively high density triggers MASK\texttt{MASK} token contraction, while insufficient density induces expansion. Extensive experiments on mathematics and code benchmarks demonstrate that \textbf{\rho-\texttt{EOS}} achieves comparable performance while substantially improving inference efficiency and token utilization. Code is available atthis https URL.

View on arXiv
Comments on this paper