Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
All Papers
0 / 0 papers shown
Title
Home
Papers
2412.10958
Cited By
v1
v2
v3 (latest)
SoftVQ-VAE: Efficient 1-Dimensional Continuous Tokenizer
Computer Vision and Pattern Recognition (CVPR), 2024
14 December 2024
Zeyang Zhang
Zihan Wang
Xianrui Li
Xingwu Sun
Fangyi Chen
Jiang Liu
Jiadong Wang
Bhiksha Raj
Zicheng Liu
Emad Barsoum
VLM
Re-assign community
ArXiv (abs)
PDF
HTML
HuggingFace (1 upvotes)
Papers citing
"SoftVQ-VAE: Efficient 1-Dimensional Continuous Tokenizer"
50 / 116 papers shown
Title
Toward Diffusible High-Dimensional Latent Spaces: A Frequency Perspective
Bolin Lai
Xudong Wang
Saketh Rambhatla
James M. Rehg
Zsolt Kira
Rohit Girdhar
Ishan Misra
DiffM
72
0
0
27 Nov 2025
Decoupling Complexity from Scale in Latent Diffusion Model
Tianxiong Zhong
Xingye Tian
X. Wang
Boyuan Jiang
Xin Tao
Pengfei Wan
DiffM
304
0
0
20 Nov 2025
VAEVQ: Enhancing Discrete Visual Tokenization through Variational Modeling
Sicheng Yang
Xing Hu
Qiang Wu
Dawei Yang
161
0
0
10 Nov 2025
VALA: Learning Latent Anchors for Training-Free and Temporally Consistent
Zhangkai Wu
Xuhui Fan
Zhongyuan Xie
Kaize Shi
Longbing Cao
DiffM
VGen
96
0
0
27 Oct 2025
UniFlow: A Unified Pixel Flow Tokenizer for Visual Understanding and Generation
Zhengrong Yue
H. Zhang
Xiangyu Zeng
Boyu Chen
Chenting Wang
...
Lu Dong
Kunpeng Du
Yi Wang
Limin Wang
Yali Wang
160
7
0
12 Oct 2025
SSDD: Single-Step Diffusion Decoder for Efficient Image Tokenization
Théophane Vallaeys
Jakob Verbeek
Matthieu Cord
DiffM
212
3
0
06 Oct 2025
DC-Gen: Post-Training Diffusion Acceleration with Deeply Compressed Latent Space
Wenkun He
Yuchao Gu
Junyu Chen
Dongyun Zou
Yujun Lin
...
Jincheng Yu
Junsong Chen
Enze Xie
Song Han
Han Cai
185
2
0
29 Sep 2025
AToken: A Unified Tokenizer for Vision
Jiasen Lu
Liangchen Song
Mingze Xu
Byeongjoo Ahn
Yanjun Wang
Chen Chen
Afshin Dehghan
Yinfei Yang
ViT
212
7
0
17 Sep 2025
GPSToken: Gaussian Parameterized Spatially-adaptive Tokenization for Image Representation and Generation
Zhengqiang Zhang
Rongyuan Wu
Lingchen Sun
Lei Zhang
249
2
0
01 Sep 2025
DC-AE 1.5: Accelerating Diffusion Model Convergence with Structured Latent Space
Junyu Chen
Dongyun Zou
Wenkun He
Junsong Chen
Enze Xie
Song Han
Han Cai
152
15
0
01 Aug 2025
Latent Denoising Makes Good Visual Tokenizers
Jiawei Yang
Tianhong Li
Lijie Fan
Yonglong Tian
Yue Wang
153
13
0
21 Jul 2025
Rethinking Discrete Tokens: Treating Them as Conditions for Continuous Autoregressive Image Synthesis
Peng Zheng
Junke Wang
Y. Chang
Yizhou Yu
Rui Ma
Zuxuan Wu
287
3
0
02 Jul 2025
VFRTok: Variable Frame Rates Video Tokenizer with Duration-Proportional Information Assumption
Tianxiong Zhong
Xingye Tian
Boyuan Jiang
Xuebo Wang
Xin Tao
Pengfei Wan
Zhiwei Zhang
269
2
0
17 May 2025
GigaTok: Scaling Visual Tokenizers to 3 Billion Parameters for Autoregressive Image Generation
Tianwei Xiong
Jun Hao Liew
Zilong Huang
Jiashi Feng
Xihui Liu
320
19
0
11 Apr 2025
Dual Codebook VQ: Enhanced Image Reconstruction with Reduced Codebook Size
Parisa Boodaghi Malidarreh
Jillur Rahman Saurav
T. Pham
Amir Hajighasemi
Anahita Samadi
Saurabh Shrinivas Maydeo
M. Nasr
Jacob M. Luber
207
0
0
13 Mar 2025
Robust Latent Matters: Boosting Image Generation with Sampling Error Synthesis
Kai Qiu
Xianrui Li
Jason Kuen
Zeyang Zhang
Xiaohao Xu
Jiuxiang Gu
Yinyi Luo
Bhiksha Raj
Zhe Lin
Marios Savvides
475
6
0
11 Mar 2025
Next Patch Prediction for Autoregressive Visual Generation
Yatian Pang
Peng Jin
Shuo Yang
Bin Lin
Bin Zhu
...
Liuhan Chen
Francis E. H. Tay
Ser-Nam Lim
Harry Yang
Li Yuan
535
19
0
19 Dec 2024
XQ-GAN: An Open-source Image Tokenization Framework for Autoregressive Generation
Xianrui Li
Kai Qiu
Zeyang Zhang
Jason Kuen
Jiuxiang Gu
Jiadong Wang
Zhe Lin
Bhiksha Raj
VLM
397
12
0
02 Dec 2024
Addressing Representation Collapse in Vector Quantized Models with One Linear Layer
Yongxin Zhu
Bing Li
Yifei Xin
Zhihua Xia
Linli Xu
449
40
0
04 Nov 2024
Randomized Autoregressive Visual Generation
Qihang Yu
Ju He
XueQing Deng
Xiaohui Shen
Liang-Chieh Chen
VGen
DiffM
294
82
1
01 Nov 2024
WorldSimBench: Towards Video Generation Models as World Simulators
Yiran Qin
Zhelun Shi
Jiwen Yu
Xijun Wang
Enshen Zhou
...
Lu Sheng
Jing Shao
Junlin Wu
Wanli Ouyang
Ruimao Zhang
EGVM
VGen
476
773
0
23 Oct 2024
Fluid: Scaling Autoregressive Text-to-image Generative Models with Continuous Tokens
International Conference on Learning Representations (ICLR), 2024
Lijie Fan
Tianhong Li
Siyang Qin
Yuanzhen Li
Chen Sun
Michael Rubinstein
Deqing Sun
Kaiming He
Yonglong Tian
VLM
DiffM
295
109
0
17 Oct 2024
Customize Your Visual Autoregressive Recipe with Set Autoregressive Modeling
Wenze Liu
Le Zhuo
Yi Xin
Sheng Xia
Peng Gao
Xiangyu Yue
203
15
0
14 Oct 2024
Representation Alignment for Generation: Training Diffusion Transformers Is Easier Than You Think
International Conference on Learning Representations (ICLR), 2024
Sihyun Yu
Sangkyung Kwak
Huiwon Jang
Jongheon Jeong
Jonathan Huang
Jinwoo Shin
Saining Xie
OCL
606
276
0
09 Oct 2024
Restructuring Vector Quantization with the Rotation Trick
International Conference on Learning Representations (ICLR), 2024
Christopher Fifty
Ronald G. Junkins
Dennis Duan
Aniketh Iger
Jerry W. Liu
Ehsan Amid
Sebastian Thrun
Christopher Ré
LLMSV
440
33
0
08 Oct 2024
ImageFolder: Autoregressive Image Generation with Folded Tokens
International Conference on Learning Representations (ICLR), 2024
Xiang Li
Kai Qiu
Hao Chen
Jason Kuen
Jiuxiang Gu
Bhiksha Raj
Zhe Lin
VLM
258
57
0
02 Oct 2024
Emu3: Next-Token Prediction is All You Need
Xinlong Wang
Xiaosong Zhang
Zhengxiong Luo
Quan-Sen Sun
Yufeng Cui
...
Xi Yang
Jingjing Liu
Yonghua Lin
Tiejun Huang
Zhongyuan Wang
MLLM
246
454
0
27 Sep 2024
MaskBit: Embedding-free Image Generation via Bit Tokens
Mark Weber
Lijun Yu
Qihang Yu
XueQing Deng
Xiaohui Shen
Daniel Cremers
Liang-Chieh Chen
DiffM
194
70
0
24 Sep 2024
Open-MAGVIT2: An Open-Source Project Toward Democratizing Auto-regressive Visual Generation
Zhuoyan Luo
Fengyuan Shi
Yixiao Ge
Yujiu Yang
Limin Wang
Ying Shan
VLM
550
99
0
06 Sep 2024
Show-o: One Single Transformer to Unify Multimodal Understanding and Generation
International Conference on Learning Representations (ICLR), 2024
Jinheng Xie
Weijia Mao
Zechen Bai
David Junhao Zhang
Weihao Wang
Kevin Qinghong Lin
Yuchao Gu
Zhijie Chen
Zhenheng Yang
Mike Zheng Shou
348
425
0
22 Aug 2024
Scalable Autoregressive Image Generation with Mamba
Haopeng Li
Jinyue Yang
Kexin Wang
Xuerui Qiu
Yuhong Chou
Xin Li
Guoqi Li
Mamba
474
23
0
22 Aug 2024
Transfusion: Predict the Next Token and Diffuse Images with One Multi-Modal Model
Chunting Zhou
Lili Yu
Arun Babu
Kushal Tirumala
Michihiro Yasunaga
Leonid Shamis
Jacob Kahn
Xuezhe Ma
Luke Zettlemoyer
Omer Levy
DiffM
251
282
0
20 Aug 2024
TokenPacker: Efficient Visual Projector for Multimodal LLM
Wentong Li
Yuqian Yuan
Jian Liu
Dongqi Tang
Song Wang
Jie Qin
Jianke Zhu
Lei Zhang
MLLM
394
118
0
02 Jul 2024
Autoregressive Image Generation without Vector Quantization
Tianhong Li
Yonglong Tian
He Li
Mingyang Deng
Kaiming He
DiffM
424
460
0
17 Jun 2024
Scaling the Codebook Size of VQGAN to 100,000 with a Utilization Rate of 99%
Lei Zhu
Fangyun Wei
Yanye Lu
Dong Chen
VLM
207
64
0
17 Jun 2024
An Image is Worth 32 Tokens for Reconstruction and Generation
Qihang Yu
Mark Weber
XueQing Deng
Xiaohui Shen
Daniel Cremers
Liang-Chieh Chen
VLM
ViT
347
181
0
11 Jun 2024
Autoregressive Model Beats Diffusion: Llama for Scalable Image Generation
Peize Sun
Yi Jiang
Shoufa Chen
Shilong Zhang
Bingyue Peng
Ping Luo
Zehuan Yuan
VLM
485
519
0
10 Jun 2024
Applying Guidance in a Limited Interval Improves Sample and Distribution Quality in Diffusion Models
Tuomas Kynkaanniemi
M. Aittala
Tero Karras
S. Laine
Timo Aila
J. Lehtinen
185
149
0
11 Apr 2024
Visual Autoregressive Modeling: Scalable Image Generation via Next-Scale Prediction
Neural Information Processing Systems (NeurIPS), 2024
Keyu Tian
Yi Jiang
Zehuan Yuan
Zehuan Yuan
Liwei Wang
VGen
375
687
0
03 Apr 2024
Mini-Gemini: Mining the Potential of Multi-modality Vision Language Models
Yanwei Li
Yuechen Zhang
Chengyao Wang
Zhisheng Zhong
Yixin Chen
Ruihang Chu
Shaoteng Liu
Jiaya Jia
VLM
MLLM
MoE
354
323
0
27 Mar 2024
Rotary Position Embedding for Vision Transformer
Byeongho Heo
Song Park
Dongyoon Han
Sangdoo Yun
367
119
0
20 Mar 2024
Codebook Transfer with Part-of-Speech for Vector-Quantized Image Modeling
Computer Vision and Pattern Recognition (CVPR), 2024
Baoquan Zhang
Huaibin Wang
Chuyao Luo
Xutao Li
Guotao Liang
Yunming Ye
Xiaochen Qi
Yao He
231
13
0
15 Mar 2024
An Image is Worth 1/2 Tokens After Layer 2: Plug-and-Play Inference Acceleration for Large Vision-Language Models
European Conference on Computer Vision (ECCV), 2024
Liang Chen
Haozhe Zhao
Tianyu Liu
Shuai Bai
Junyang Lin
Chang Zhou
Baobao Chang
MLLM
VLM
307
317
0
11 Mar 2024
Feast Your Eyes: Mixture-of-Resolution Adaptation for Multimodal Large Language Models
Gen Luo
Weihao Ye
Yuxin Zhang
Xiawu Zheng
Xiaoshuai Sun
Rongrong Ji
VLM
214
95
0
05 Mar 2024
Fast Timing-Conditioned Latent Audio Diffusion
Zach Evans
CJ Carr
Josiah Taylor
Scott H. Hawley
Jordi Pons
DiffM
474
191
0
07 Feb 2024
Lumiere: A Space-Time Diffusion Model for Video Generation
ACM SIGGRAPH Conference and Exhibition on Computer Graphics and Interactive Techniques in Asia (SIGGRAPH Asia), 2024
Omer Bar-Tal
Hila Chefer
Omer Tov
Charles Herrmann
Roni Paiss
...
T. Michaeli
Oliver Wang
Deqing Sun
Tali Dekel
Inbar Mosseri
VGen
349
366
0
23 Jan 2024
SiT: Exploring Flow and Diffusion-based Generative Models with Scalable Interpolant Transformers
European Conference on Computer Vision (ECCV), 2024
Nanye Ma
Mark Goldstein
M. S. Albergo
Nicholas M. Boffi
Eric Vanden-Eijnden
Saining Xie
DiffM
360
408
0
16 Jan 2024
GIVT: Generative Infinite-Vocabulary Transformers
European Conference on Computer Vision (ECCV), 2023
Michael Tschannen
Cian Eastwood
Fabian Mentzer
318
62
0
04 Dec 2023
Stable Video Diffusion: Scaling Latent Video Diffusion Models to Large Datasets
A. Blattmann
Tim Dockhorn
Sumith Kulal
Daniel Mendelevitch
Maciej Kilian
...
Zion English
Vikram S. Voleti
Adam Letts
Varun Jampani
Robin Rombach
VGen
865
1,882
0
25 Nov 2023
SPHINX: The Joint Mixing of Weights, Tasks, and Visual Embeddings for Multi-modal Large Language Models
Ziyi Lin
Chris Liu
Renrui Zhang
Shiyang Feng
Longtian Qiu
...
Siyuan Huang
Yichi Zhang
Xuming He
Jiaming Song
Yu Qiao
MLLM
VLM
272
272
0
13 Nov 2023
1
2
3
Next