Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
2502.05178
Cited By
QLIP: Text-Aligned Visual Tokenization Unifies Auto-Regressive Multimodal Understanding and Generation
7 February 2025
Yue Zhao
Fuzhao Xue
Scott Reed
Linxi Fan
Yuke Zhu
Jan Kautz
Zhiding Yu
Philipp Krahenbuhl
De-An Huang
MLLM
CLIP
VLM
Re-assign community
ArXiv (abs)
PDF
HTML
HuggingFace (10 upvotes)
Connect (YouTube)
Papers citing
"QLIP: Text-Aligned Visual Tokenization Unifies Auto-Regressive Multimodal Understanding and Generation"
12 / 12 papers shown
VQRAE: Representation Quantization Autoencoders for Multimodal Understanding, Generation and Reconstruction
Sinan Du
Jiahao Guo
Bo Li
Shuhao Cui
Zhengzhuo Xu
...
Yongxian Wei
Kun Gai
X. Wang
Kai Wu
C. Yuan
213
0
0
28 Nov 2025
UniModel: A Visual-Only Framework for Unified Multimodal Understanding and Generation
Chi Zhang
Jiepeng Wang
Y. Wang
Yuanzhi Liang
X. J. Yang
Zuoxin Li
Haibin Huang
Xuelong Li
DiffM
200
0
0
21 Nov 2025
Wave-Particle (Continuous-Discrete) Dualistic Visual Tokenization for Unified Understanding and Generation
Yizhu Chen
Chen Ju
Z. Wang
Shuai Xiao
X. Chen
Jinsong Lan
Xiaoyong Zhu
Ying Chen
138
0
0
03 Nov 2025
UniFlow: A Unified Pixel Flow Tokenizer for Visual Understanding and Generation
Zhengrong Yue
H. Zhang
Xiangyu Zeng
Boyu Chen
Chenting Wang
...
Lu Dong
Kunpeng Du
Yi Wang
Limin Wang
Yali Wang
185
7
0
12 Oct 2025
Aligning Visual Foundation Encoders to Tokenizers for Diffusion Models
Bowei Chen
Sai Bi
Hao Tan
Chentao Song
Tianyuan Zhang
Zhengqi Li
Yuanjun Xiong
Jianming Zhang
Kai Zhang
212
4
0
29 Sep 2025
WeTok: Powerful Discrete Tokenization for High-Fidelity Visual Reconstruction
Shaobin Zhuang
Yiwei Guo
Canmiao Fu
Z. Huang
Zeyue Tian
Ying Zhang
Ying Zhang
Chen Li
Yali Wang
ViT
222
2
0
07 Aug 2025
Discrete Tokenization for Multimodal LLMs: A Comprehensive Survey
Jindong Li
Yali Fu
Jiahong Liu
Linxiao Cao
Wei Ji
Menglin Yang
Irwin King
Ming-Hsuan Yang
OffRL
156
3
0
21 Jul 2025
TokLIP: Marry Visual Tokens to CLIP for Multimodal Comprehension and Generation
Haokun Lin
Teng Wang
Yixiao Ge
Yuying Ge
Zhichao Lu
Ying Wei
Gang Qu
Zhenan Sun
Mingyu Ding
MLLM
VLM
432
31
0
08 May 2025
Unified Multimodal Understanding and Generation Models: Advances, Challenges, and Opportunities
Wei Wei
Jintao Guo
Shanshan Zhao
Minghao Fu
Lunhao Duan
...
Guo-Hua Wang
Qing-Guo Chen
Zhao Xu
Weihua Luo
Kaifu Zhang
DiffM
1.1K
31
0
05 May 2025
GeoUni: A Unified Model for Generating Geometry Diagrams, Problems and Problem Solutions
Jo-Ku Cheng
Zeren Zhang
Ran Chen
Jingyang Deng
Ziran Qin
Jinwen Ma
329
6
0
14 Apr 2025
VARGPT-v1.1: Improve Visual Autoregressive Large Unified Model via Iterative Instruction Tuning and Reinforcement Learning
Xianwei Zhuang
Yuxin Xie
Yufan Deng
Dongchao Yang
Liming Liang
Jinghan Ru
Yuguo Yin
Yuexian Zou
341
15
0
03 Apr 2025
Next Patch Prediction for Autoregressive Visual Generation
Yatian Pang
Peng Jin
Shuo Yang
Bin Lin
Bin Zhu
...
Liuhan Chen
Francis E. H. Tay
Ser-Nam Lim
Harry Yang
Li Yuan
633
21
0
19 Dec 2024
1