ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2503.14070
63
1

Fast Autoregressive Video Generation with Diagonal Decoding

18 March 2025
Yang Ye
Junliang Guo
Haoyu Wu
Tianyu He
Tim Pearce
Tabish Rashid
Katja Hofmann
Jiang Bian
    DiffM
    VGen
ArXivPDFHTML
Abstract

Autoregressive Transformer models have demonstrated impressive performance in video generation, but their sequential token-by-token decoding process poses a major bottleneck, particularly for long videos represented by tens of thousands of tokens. In this paper, we propose Diagonal Decoding (DiagD), a training-free inference acceleration algorithm for autoregressively pre-trained models that exploits spatial and temporal correlations in videos. Our method generates tokens along diagonal paths in the spatial-temporal token grid, enabling parallel decoding within each frame as well as partially overlapping across consecutive frames. The proposed algorithm is versatile and adaptive to various generative models and tasks, while providing flexible control over the trade-off between inference speed and visual quality. Furthermore, we propose a cost-effective finetuning strategy that aligns the attention patterns of the model with our decoding order, further mitigating the training-inference gap on small-scale models. Experiments on multiple autoregressive video generation models and datasets demonstrate that DiagD achieves up to 10×10\times10× speedup compared to naive sequential decoding, while maintaining comparable visual fidelity.

View on arXiv
@article{ye2025_2503.14070,
  title={ Fast Autoregressive Video Generation with Diagonal Decoding },
  author={ Yang Ye and Junliang Guo and Haoyu Wu and Tianyu He and Tim Pearce and Tabish Rashid and Katja Hofmann and Jiang Bian },
  journal={arXiv preprint arXiv:2503.14070},
  year={ 2025 }
}
Comments on this paper