ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2412.15321
  4. Cited By
Next Patch Prediction for Autoregressive Visual Generation
v1v2v3 (latest)

Next Patch Prediction for Autoregressive Visual Generation

19 December 2024
Yatian Pang
Peng Jin
Shuo Yang
Bin Lin
Bin Zhu
Zhenyu Tang
Liuhan Chen
Francis E. H. Tay
Ser-Nam Lim
Harry Yang
Li Yuan
ArXiv (abs)PDFHTMLHuggingFace (1 upvotes)

Papers citing "Next Patch Prediction for Autoregressive Visual Generation"

50 / 112 papers shown
Latent Speech-Text Transformer
Latent Speech-Text Transformer
Yen-Ju Lu
Yashesh Gaur
Wei Zhou
Benjamin Muller
Jesus Villalba
...
Luke Zettlemoyer
Gargi Ghosh
Mike Lewis
Srinivasan Iyer
Duc Le
VLM
171
5
0
07 Oct 2025
Understand Before You Generate: Self-Guided Training for Autoregressive Image Generation
Understand Before You Generate: Self-Guided Training for Autoregressive Image Generation
Xiaoyu Yue
Zidong Wang
Yuqing Wang
Wenlong Zhang
Xihui Liu
Wanli Ouyang
Wenlong Zhang
Luping Zhou
GAN
301
3
0
18 Sep 2025
Image Tokenizer Needs Post-Training
Image Tokenizer Needs Post-Training
Kai Qiu
Xiang Li
Hao Chen
Jason Kuen
Xiaohao Xu
Jiuxiang Gu
Yinyi Luo
Bhiksha Raj
Zhe Lin
Marios Savvides
VLM
241
5
0
15 Sep 2025
Exploiting Discriminative Codebook Prior for Autoregressive Image Generation
Exploiting Discriminative Codebook Prior for Autoregressive Image Generation
Longxiang Tang
Ruihang Chu
Xiang Wang
Yujin Han
Pingyu Wu
Chunming He
Yingya Zhang
Shiwei Zhang
Jiaya Jia
192
4
0
14 Aug 2025
E-4DGS: High-Fidelity Dynamic Reconstruction from the Multi-view Event Cameras
E-4DGS: High-Fidelity Dynamic Reconstruction from the Multi-view Event Cameras
Chaoran Feng
Zhenyu Tang
Wangbo Yu
Yatian Pang
Yian Zhao
Jianbin Zhao
Li Yuan
Yonghong Tian
3DGS
273
4
0
13 Aug 2025
VeOmni: Scaling Any Modality Model Training with Model-Centric Distributed Recipe Zoo
VeOmni: Scaling Any Modality Model Training with Model-Centric Distributed Recipe Zoo
Qianli Ma
Yaowei Zheng
Zhelun Shi
Zhongkai Zhao
Bin Jia
...
Y. Li
Jiacheng Yang
Yanghua Peng
Zhi-Li Zhang
Xin Liu
MoEVLM
407
8
0
04 Aug 2025
EF-VI: Enhancing End-Frame Injection for Video Inbetweening
EF-VI: Enhancing End-Frame Injection for Video Inbetweening
Liuhan Chen
Xiaodong Cun
Xiaoyu Li
Xianyi He
Shenghai Yuan
Jie Chen
Mingyu Ding
Lichao Sun
VGen
383
0
0
27 May 2025
UniGen: Enhanced Training & Test-Time Strategies for Unified Multimodal Understanding and Generation
UniGen: Enhanced Training & Test-Time Strategies for Unified Multimodal Understanding and Generation
Rui Tian
Mingfei Gao
Mingze Xu
Jiaming Hu
Jiasen Lu
Zuxuan Wu
Yinfei Yang
Afshin Dehghan
MLLMVLM
138
1
0
20 May 2025
MVAR: Visual Autoregressive Modeling with Scale and Spatial Markovian Conditioning
MVAR: Visual Autoregressive Modeling with Scale and Spatial Markovian Conditioning
Jinhua Zhang
Wei Long
Minghao Han
Weiyi You
Shuhang Gu
BDL
366
2
0
19 May 2025
Unified Multimodal Understanding and Generation Models: Advances, Challenges, and Opportunities
Unified Multimodal Understanding and Generation Models: Advances, Challenges, and Opportunities
Wei Wei
Jintao Guo
Shanshan Zhao
Minghao Fu
Lunhao Duan
...
Guo-Hua Wang
Qing-Guo Chen
Zhao Xu
Weihua Luo
Kaifu Zhang
DiffM
1.4K
41
0
05 May 2025
GPT-ImgEval: A Comprehensive Benchmark for Diagnosing GPT4o in Image Generation
GPT-ImgEval: A Comprehensive Benchmark for Diagnosing GPT4o in Image Generation
Zhiyuan Yan
Junyan Ye
Weijia Li
Zilong Huang
Shenghai Yuan
Xiangyang He
Kaiqing Lin
Jun-Jian He
Conghui He
Lichao Sun
MLLMEGVM
615
60
0
03 Apr 2025
NeuralGS: Bridging Neural Fields and 3D Gaussian Splatting for Compact 3D Representations
NeuralGS: Bridging Neural Fields and 3D Gaussian Splatting for Compact 3D Representations
Zhenyu Tang
Chaoran Feng
Xinhua Cheng
Wangbo Yu
Junwu Zhang
Yuan Liu
Xiaoxiao Long
Wenping Wang
Li Yuan
3DGS
496
13
0
29 Mar 2025
Robust Latent Matters: Boosting Image Generation with Sampling Error Synthesis
Robust Latent Matters: Boosting Image Generation with Sampling Error Synthesis
Kai Qiu
Xianrui Li
Jason Kuen
Zeyang Zhang
Xiaohao Xu
Jiuxiang Gu
Yinyi Luo
Bhiksha Raj
Zhe Lin
Marios Savvides
611
7
0
11 Mar 2025
WISE: A World Knowledge-Informed Semantic Evaluation for Text-to-Image Generation
WISE: A World Knowledge-Informed Semantic Evaluation for Text-to-Image Generation
Yuwei Niu
Munan Ning
Mengren Zheng
Weiyang Jin
Bin Lin
...
Jiaqi Liao
Chaoran Feng
Kunpeng Ning
Bin Zhu
Li Yuan
EGVM
606
128
0
10 Mar 2025
Frequency Autoregressive Image Generation with Continuous Tokens
Frequency Autoregressive Image Generation with Continuous Tokens
Hu Yu
Hao Luo
Hangjie Yuan
Yu Rong
Feng Zhao
Feng Zhao
VGen
343
22
0
07 Mar 2025
QLIP: Text-Aligned Visual Tokenization Unifies Auto-Regressive Multimodal Understanding and Generation
QLIP: Text-Aligned Visual Tokenization Unifies Auto-Regressive Multimodal Understanding and Generation
Yue Zhao
Fuzhao Xue
Scott Reed
Linxi Fan
Yuke Zhu
Jan Kautz
Zhiding Yu
Philipp Krahenbuhl
De-An Huang
MLLMCLIPVLM
375
24
0
07 Feb 2025
Janus-Pro: Unified Multimodal Understanding and Generation with Data and Model Scaling
Janus-Pro: Unified Multimodal Understanding and Generation with Data and Model Scaling
Xiaokang Chen
Zhiyu Wu
Xingchao Liu
Zizheng Pan
Wen Liu
Zhenda Xie
X. Yu
Chong Ruan
AI4TS
630
568
0
29 Jan 2025
ARFlow: Autoregressive Flow with Hybrid Linear Attention
ARFlow: Autoregressive Flow with Hybrid Linear Attention
Mude Hui
Rui-jie Zhu
Aaron Courville
Yu Zhang
Zirui Wang
Yuyin Zhou
Nhan Duy Truong
Cihang Xie
371
4
0
27 Jan 2025
DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning
DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning
DeepSeek-AI
Daya Guo
Dejian Yang
Haowei Zhang
Junxiao Song
...
Shiyu Wang
S. Yu
Shunfeng Zhou
Shuting Pan
S.S. Li
OffRLAI4TSLRMReLMVLM
1.7K
5,342
0
22 Jan 2025
Hierarchical Banzhaf Interaction for General Video-Language Representation Learning
Hierarchical Banzhaf Interaction for General Video-Language Representation LearningIEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2024
Peng Jin
Haoyang Li
Li Yuan
Shuicheng Yan
Jie Chen
457
3
0
31 Dec 2024
FlowAR: Scale-wise Autoregressive Image Generation Meets Flow Matching
FlowAR: Scale-wise Autoregressive Image Generation Meets Flow Matching
Sucheng Ren
Qihang Yu
Ju He
Xiaohui Shen
Alan Yuille
Liang-Chieh Chen
VGen
614
51
0
19 Dec 2024
Parallelized Autoregressive Visual Generation
Parallelized Autoregressive Visual GenerationComputer Vision and Pattern Recognition (CVPR), 2024
Yanjie Wang
Shuhuai Ren
Zhijie Lin
Yujin Han
Haoyuan Guo
Zhenheng Yang
Difan Zou
Jiashi Feng
Xihui Liu
VGen
695
46
0
19 Dec 2024
SoftVQ-VAE: Efficient 1-Dimensional Continuous Tokenizer
SoftVQ-VAE: Efficient 1-Dimensional Continuous TokenizerComputer Vision and Pattern Recognition (CVPR), 2024
Zeyang Zhang
Zihan Wang
Xianrui Li
Xingwu Sun
Fangyi Chen
Jiang Liu
Jiadong Wang
Bhiksha Raj
Zicheng Liu
Emad Barsoum
VLM
898
41
0
14 Dec 2024
Infinity: Scaling Bitwise AutoRegressive Modeling for High-Resolution Image Synthesis
Infinity: Scaling Bitwise AutoRegressive Modeling for High-Resolution Image SynthesisComputer Vision and Pattern Recognition (CVPR), 2024
J. N. Han
Jinlai Liu
Yi Jiang
Bin Yan
Yuqi Zhang
Zehuan Yuan
Zehuan Yuan
Xiaobing Liu
353
52
0
05 Dec 2024
TokenFlow: Unified Image Tokenizer for Multimodal Understanding and Generation
TokenFlow: Unified Image Tokenizer for Multimodal Understanding and GenerationComputer Vision and Pattern Recognition (CVPR), 2024
Liao Qu
Huichao Zhang
Yiheng Liu
Xinyu Wang
Yi Jiang
Yiming Gao
Hu Ye
Daniel K. Du
Zehuan Yuan
Xinglong Wu
463
54
0
04 Dec 2024
RandAR: Decoder-only Autoregressive Visual Generation in Random Orders
RandAR: Decoder-only Autoregressive Visual Generation in Random OrdersComputer Vision and Pattern Recognition (CVPR), 2024
Ziqi Pang
Tianyuan Zhang
Fujun Luan
Yunze Man
Hao Tan
Kai Zhang
William T. Freeman
Yu-Xiong Wang
VGen
433
68
0
02 Dec 2024
Open-Sora Plan: Open-Source Large Video Generation Model
Bin Lin
Yunyang Ge
Xinhua Cheng
Zongjian Li
Bin Zhu
...
Zhang Pan
Xing Zhou
Shaoling Dong
Yonghong Tian
Li-xin Yuan
VLMVGen
526
230
0
28 Nov 2024
Randomized Autoregressive Visual Generation
Randomized Autoregressive Visual Generation
Qihang Yu
Ju He
XueQing Deng
Xiaohui Shen
Liang-Chieh Chen
VGenDiffM
376
99
1
01 Nov 2024
Janus: Decoupling Visual Encoding for Unified Multimodal Understanding
  and Generation
Janus: Decoupling Visual Encoding for Unified Multimodal Understanding and GenerationComputer Vision and Pattern Recognition (CVPR), 2024
Chengyue Wu
Xiaokang Chen
Z. F. Wu
Yiyang Ma
Xingchao Liu
...
Wen Liu
Zhenda Xie
Xingkai Yu
Chong Ruan
Ping Luo
AI4TS
519
336
0
17 Oct 2024
Stabilize the Latent Space for Image Autoregressive Modeling: A Unified
  Perspective
Stabilize the Latent Space for Image Autoregressive Modeling: A Unified PerspectiveNeural Information Processing Systems (NeurIPS), 2024
Yongxin Zhu
Bing Li
Hang Zhang
Xin Li
Linli Xu
Lidong Bing
DiffM
341
20
0
16 Oct 2024
DART: Denoising Autoregressive Transformer for Scalable Text-to-Image Generation
DART: Denoising Autoregressive Transformer for Scalable Text-to-Image Generation
Jiatao Gu
Yuyang Wang
Yizhe Zhang
Qihang Zhang
Dinghuai Zhang
Navdeep Jaitly
Josh Susskind
Shuangfei Zhai
DiffM
452
29
0
10 Oct 2024
ImageFolder: Autoregressive Image Generation with Folded Tokens
ImageFolder: Autoregressive Image Generation with Folded TokensInternational Conference on Learning Representations (ICLR), 2024
Xiang Li
Kai Qiu
Hao Chen
Jason Kuen
Jiuxiang Gu
Bhiksha Raj
Zhe Lin
VLM
374
70
0
02 Oct 2024
Emu3: Next-Token Prediction is All You Need
Emu3: Next-Token Prediction is All You Need
Xinlong Wang
Xiaosong Zhang
Zhengxiong Luo
Quan-Sen Sun
Yufeng Cui
...
Xi Yang
Jingjing Liu
Yonghua Lin
Tiejun Huang
Zhongyuan Wang
MLLM
373
596
0
27 Sep 2024
MaskBit: Embedding-free Image Generation via Bit Tokens
MaskBit: Embedding-free Image Generation via Bit Tokens
Mark Weber
Lijun Yu
Qihang Yu
XueQing Deng
Xiaohui Shen
Daniel Cremers
Liang-Chieh Chen
DiffM
230
80
0
24 Sep 2024
OmniGen: Unified Image Generation
OmniGen: Unified Image GenerationComputer Vision and Pattern Recognition (CVPR), 2024
Shitao Xiao
Yueze Wang
Yueze Wang
Huaying Yuan
Xingrun Xing
Ruiran Yan
Shuting Wang
Tiejun Huang
Zheng Liu
DiffMVLMSyDa
523
299
0
17 Sep 2024
Open-MAGVIT2: An Open-Source Project Toward Democratizing Auto-regressive Visual Generation
Open-MAGVIT2: An Open-Source Project Toward Democratizing Auto-regressive Visual Generation
Zhuoyan Luo
Fengyuan Shi
Yixiao Ge
Yujiu Yang
Limin Wang
Ying Shan
VLM
712
119
0
06 Sep 2024
OD-VAE: An Omni-dimensional Video Compressor for Improving Latent Video
  Diffusion Model
OD-VAE: An Omni-dimensional Video Compressor for Improving Latent Video Diffusion Model
Liuhan Chen
Zongjian Li
Bin Lin
Bin Zhu
Qian Wang
Shenghai Yuan
X. Zhou
Xinhua Cheng
Li Yuan
DiffM
437
27
0
02 Sep 2024
Show-o: One Single Transformer to Unify Multimodal Understanding and
  Generation
Show-o: One Single Transformer to Unify Multimodal Understanding and GenerationInternational Conference on Learning Representations (ICLR), 2024
Jinheng Xie
Weijia Mao
Zechen Bai
David Junhao Zhang
Weihao Wang
Kevin Qinghong Lin
Yuchao Gu
Zhijie Chen
Zhenheng Yang
Mike Zheng Shou
533
535
0
22 Aug 2024
Transfusion: Predict the Next Token and Diffuse Images with One
  Multi-Modal Model
Transfusion: Predict the Next Token and Diffuse Images with One Multi-Modal Model
Chunting Zhou
Lili Yu
Arun Babu
Kushal Tirumala
Michihiro Yasunaga
Leonid Shamis
Jacob Kahn
Xuezhe Ma
Luke Zettlemoyer
Omer Levy
DiffM
298
345
0
20 Aug 2024
Autoregressive Image Generation without Vector Quantization
Autoregressive Image Generation without Vector Quantization
Tianhong Li
Yonglong Tian
He Li
Mingyang Deng
Kaiming He
DiffM
573
552
0
17 Jun 2024
An Image is Worth 32 Tokens for Reconstruction and Generation
An Image is Worth 32 Tokens for Reconstruction and Generation
Qihang Yu
Mark Weber
XueQing Deng
Xiaohui Shen
Daniel Cremers
Liang-Chieh Chen
VLMViT
468
236
0
11 Jun 2024
Autoregressive Model Beats Diffusion: Llama for Scalable Image
  Generation
Autoregressive Model Beats Diffusion: Llama for Scalable Image Generation
Peize Sun
Yi Jiang
Shoufa Chen
Shilong Zhang
Bingyue Peng
Ping Luo
Zehuan Yuan
VLM
606
622
0
10 Jun 2024
Chameleon: Mixed-Modal Early-Fusion Foundation Models
Chameleon: Mixed-Modal Early-Fusion Foundation Models
Chameleon Team
MLLM
723
729
0
16 May 2024
Groma: Localized Visual Tokenization for Grounding Multimodal Large
  Language Models
Groma: Localized Visual Tokenization for Grounding Multimodal Large Language Models
Chuofan Ma
Yi Jiang
Jiannan Wu
Zehuan Yuan
Xiaojuan Qi
VLMObjD
305
115
0
19 Apr 2024
Visual Autoregressive Modeling: Scalable Image Generation via Next-Scale
  Prediction
Visual Autoregressive Modeling: Scalable Image Generation via Next-Scale PredictionNeural Information Processing Systems (NeurIPS), 2024
Keyu Tian
Yi Jiang
Zehuan Yuan
Zehuan Yuan
Liwei Wang
VGen
473
835
0
03 Apr 2024
Scaling Rectified Flow Transformers for High-Resolution Image Synthesis
Scaling Rectified Flow Transformers for High-Resolution Image Synthesis
Patrick Esser
Sumith Kulal
A. Blattmann
Rahim Entezari
Jonas Muller
...
Zion English
Kyle Lacey
Alex Goodwin
Yannik Marek
Robin Rombach
DiffM
2.9K
3,297
0
05 Mar 2024
Playground v2.5: Three Insights towards Enhancing Aesthetic Quality in
  Text-to-Image Generation
Playground v2.5: Three Insights towards Enhancing Aesthetic Quality in Text-to-Image Generation
Daiqing Li
Aleks Kamko
Ehsan Akhgari
Ali Sabet
Linmiao Xu
Suhail Doshi
210
221
0
27 Feb 2024
MoE-LLaVA: Mixture of Experts for Large Vision-Language Models
MoE-LLaVA: Mixture of Experts for Large Vision-Language Models
Bin Lin
Zhenyu Tang
Yang Ye
Jiaxi Cui
Bin Zhu
...
Jinfa Huang
Junwu Zhang
Yatian Pang
Munan Ning
Li-ming Yuan
VLMMLLMMoE
499
298
0
29 Jan 2024
DeepSeek LLM: Scaling Open-Source Language Models with Longtermism
DeepSeek LLM: Scaling Open-Source Language Models with Longtermism
DeepSeek-AI Xiao Bi
:
Xiao Bi
Deli Chen
Guanting Chen
...
Yao Zhao
Shangyan Zhou
Shunfeng Zhou
Qihao Zhu
Yuheng Zou
LRMALM
432
689
0
05 Jan 2024
Unified-IO 2: Scaling Autoregressive Multimodal Models with Vision,
  Language, Audio, and Action
Unified-IO 2: Scaling Autoregressive Multimodal Models with Vision, Language, Audio, and Action
Jiasen Lu
Christopher Clark
Sangho Lee
Zichen Zhang
Savya Khosla
Ryan Marten
Derek Hoiem
Aniruddha Kembhavi
VLMMLLM
346
297
0
28 Dec 2023
123
Next
Page 1 of 3