ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2503.10696
39
1

Neighboring Autoregressive Modeling for Efficient Visual Generation

12 March 2025
Yefei He
Yuanyu He
Shaoxuan He
Feng Chen
Hong Zhou
K. Zhang
Bohan Zhuang
ArXivPDFHTML
Abstract

Visual autoregressive models typically adhere to a raster-order ``next-token prediction" paradigm, which overlooks the spatial and temporal locality inherent in visual content. Specifically, visual tokens exhibit significantly stronger correlations with their spatially or temporally adjacent tokens compared to those that are distant. In this paper, we propose Neighboring Autoregressive Modeling (NAR), a novel paradigm that formulates autoregressive visual generation as a progressive outpainting procedure, following a near-to-far ``next-neighbor prediction" mechanism. Starting from an initial token, the remaining tokens are decoded in ascending order of their Manhattan distance from the initial token in the spatial-temporal space, progressively expanding the boundary of the decoded region. To enable parallel prediction of multiple adjacent tokens in the spatial-temporal space, we introduce a set of dimension-oriented decoding heads, each predicting the next token along a mutually orthogonal dimension. During inference, all tokens adjacent to the decoded tokens are processed in parallel, substantially reducing the model forward steps for generation. Experiments on ImageNet256×256256\times 256256×256 and UCF101 demonstrate that NAR achieves 2.4×\times× and 8.6×\times× higher throughput respectively, while obtaining superior FID/FVD scores for both image and video generation tasks compared to the PAR-4X approach. When evaluating on text-to-image generation benchmark GenEval, NAR with 0.8B parameters outperforms Chameleon-7B while using merely 0.4 of the training data. Code is available atthis https URL.

View on arXiv
@article{he2025_2503.10696,
  title={ Neighboring Autoregressive Modeling for Efficient Visual Generation },
  author={ Yefei He and Yuanyu He and Shaoxuan He and Feng Chen and Hong Zhou and Kaipeng Zhang and Bohan Zhuang },
  journal={arXiv preprint arXiv:2503.10696},
  year={ 2025 }
}
Comments on this paper