Towards Unifying Understanding and Generation in the Era of Vision
Foundation Models: A Survey from the Autoregression Perspective

v1v2 (latest)

Towards Unifying Understanding and Generation in the Era of Vision Foundation Models: A Survey from the Autoregression Perspective

29 October 2024

Guoqi Li

Shanghang Zhang

ArXiv (abs)PDF HTML Github (25★)

Papers citing "Towards Unifying Understanding and Generation in the Era of Vision Foundation Models: A Survey from the Autoregression Perspective"

16 / 16 papers shown

Title
A Unified Low-level Foundation Model for Enhancing Pathology Image Quality Ziyi Liu Zhe Xu Jiabo Ma Wenqaing Li Junlin Hou Fuxiang Huang Xi Wang R. Chan T. Wong Hao Chen DiffM MedIm 107 0 0 01 Sep 2025
VisionLLM v2: An End-to-End Generalist Multimodal Large Language Model for Hundreds of Vision-Language TasksNeural Information Processing Systems (NeurIPS), 2024 Jiannan Wu Muyan Zhong Sen Xing Zeqiang Lai Zhaoyang Liu ... Lewei Lu Tong Lu Ping Luo Yu Qiao Jifeng Dai MLLM VLM LRM 695 116 0 03 Jan 2025
BiGR: Harnessing Binary Latent Codes for Image Generation and Improved Visual Representation CapabilitiesInternational Conference on Learning Representations (ICLR), 2024 Shaozhe Hao Xuantong Liu Xianbiao Qi Shihao Zhao Bojia Zi Rong Xiao Kai Han Kwan-Yee K. Wong 424 4 0 18 Oct 2024
DART: Denoising Autoregressive Transformer for Scalable Text-to-Image Generation Jiatao Gu Yuyang Wang Yizhe Zhang Qihang Zhang Dinghuai Zhang Navdeep Jaitly Josh Susskind Shuangfei Zhai DiffM 299 25 0 10 Oct 2024
Aria: An Open Multimodal Native Mixture-of-Experts Model Dongxu Li Yudong Liu Haoning Wu Yue Wang Zhiqi Shen ... Lihuan Zhang Hanshu Yan Guoyin Wang Bei Chen Junnan Li MoE 404 114 0 08 Oct 2024
Restructuring Vector Quantization with the Rotation TrickInternational Conference on Learning Representations (ICLR), 2024 Christopher Fifty Ronald G. Junkins Dennis Duan Aniketh Iger Jerry W. Liu Ehsan Amid Sebastian Thrun Christopher Ré LLMSV 432 32 0 08 Oct 2024
Pyramidal Flow Matching for Efficient Video Generative ModelingInternational Conference on Learning Representations (ICLR), 2024 Yang Jin Zhicheng Sun Ningyuan Li Kun Xu K. Xu ... Nan Zhuang Quzhe Huang Yang Song Yadong Mu Zhouchen Lin VGen 401 184 0 08 Oct 2024
Autoregressive Action Sequence Learning for Robotic ManipulationIEEE Robotics and Automation Letters (RA-L), 2024 Xinyu Zhang Yuhan Liu Haonan Chang Liam Schramm Abdeslam Boularias 361 31 0 04 Oct 2024
Loong: Generating Minute-level Long Videos with Autoregressive Language Models Yuqing Wang Tianwei Xiong Daquan Zhou Zhijie Lin Yang Zhao Bingyi Kang Jiashi Feng Xihui Liu VGen 312 62 0 03 Oct 2024
ControlAR: Controllable Image Generation with Autoregressive ModelsInternational Conference on Learning Representations (ICLR), 2024 Zongming Li Tianheng Cheng Shoufa Chen Peize Sun Haocheng Shen Longjin Ran Xiaoxin Chen Wenyu Liu Xinggang Wang DiffM 534 36 0 03 Oct 2024
Accelerating Auto-regressive Text-to-Image Generation with Training-free Speculative Jacobi DecodingInternational Conference on Learning Representations (ICLR), 2024 Yao Teng Han Shi Xian Liu Xuefei Ning Guohao Dai Yu Wang Zhenguo Li Xihui Liu 285 39 0 02 Oct 2024
MIO: A Foundation Model on Multimodal Tokens Zekun Wang King Zhu Chunpu Xu Wangchunshu Zhou Jiaheng Liu ... Yuanxing Zhang Ge Zhang Ke Xu Jie Fu Wenhao Huang MLLM AuLLM 350 20 0 26 Sep 2024
Open-MAGVIT2: An Open-Source Project Toward Democratizing Auto-regressive Visual Generation Zhuoyan Luo Fengyuan Shi Yixiao Ge Yujiu Yang Limin Wang Ying Shan VLM 522 98 0 06 Sep 2024
Lumina-mGPT: Illuminate Flexible Photorealistic Text-to-Image Generation with Multimodal Generative Pretraining Dongyang Liu Shitian Zhao Le Zhuo Weifeng Lin Ping Luo Xinyue Li Qi Qin Yu Qiao Hongsheng Li Peng Gao MLLM 357 104 0 05 Aug 2024
Towards Semantic Equivalence of Tokenization in Multimodal LLMInternational Conference on Learning Representations (ICLR), 2024 Shengqiong Wu Hao Fei Xiangtai Li Jiayi Ji Hanwang Zhang Tat-Seng Chua Shuicheng Yan MLLM 446 55 0 07 Jun 2024
Chameleon: Mixed-Modal Early-Fusion Foundation Models Chameleon Team MLLM 456 588 0 16 May 2024