ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2412.10958
  4. Cited By
SoftVQ-VAE: Efficient 1-Dimensional Continuous Tokenizer
v1v2v3 (latest)

SoftVQ-VAE: Efficient 1-Dimensional Continuous Tokenizer

Computer Vision and Pattern Recognition (CVPR), 2024
14 December 2024
Zeyang Zhang
Zihan Wang
Xianrui Li
Xingwu Sun
Fangyi Chen
Jiang Liu
Jiadong Wang
Bhiksha Raj
Zicheng Liu
Emad Barsoum
    VLM
ArXiv (abs)PDFHTMLHuggingFace (1 upvotes)

Papers citing "SoftVQ-VAE: Efficient 1-Dimensional Continuous Tokenizer"

50 / 116 papers shown
Title
Toward Diffusible High-Dimensional Latent Spaces: A Frequency Perspective
Toward Diffusible High-Dimensional Latent Spaces: A Frequency Perspective
Bolin Lai
Xudong Wang
Saketh Rambhatla
James M. Rehg
Zsolt Kira
Rohit Girdhar
Ishan Misra
DiffM
84
0
0
27 Nov 2025
Decoupling Complexity from Scale in Latent Diffusion Model
Tianxiong Zhong
Xingye Tian
X. Wang
Boyuan Jiang
Xin Tao
Pengfei Wan
DiffM
316
0
0
20 Nov 2025
VAEVQ: Enhancing Discrete Visual Tokenization through Variational Modeling
VAEVQ: Enhancing Discrete Visual Tokenization through Variational Modeling
Sicheng Yang
Xing Hu
Qiang Wu
Dawei Yang
173
0
0
10 Nov 2025
VALA: Learning Latent Anchors for Training-Free and Temporally Consistent
VALA: Learning Latent Anchors for Training-Free and Temporally Consistent
Zhangkai Wu
Xuhui Fan
Zhongyuan Xie
Kaize Shi
Longbing Cao
DiffMVGen
108
0
0
27 Oct 2025
UniFlow: A Unified Pixel Flow Tokenizer for Visual Understanding and Generation
UniFlow: A Unified Pixel Flow Tokenizer for Visual Understanding and Generation
Zhengrong Yue
H. Zhang
Xiangyu Zeng
Boyu Chen
Chenting Wang
...
Lu Dong
Kunpeng Du
Yi Wang
Limin Wang
Yali Wang
172
7
0
12 Oct 2025
SSDD: Single-Step Diffusion Decoder for Efficient Image Tokenization
SSDD: Single-Step Diffusion Decoder for Efficient Image Tokenization
Théophane Vallaeys
Jakob Verbeek
Matthieu Cord
DiffM
222
3
0
06 Oct 2025
DC-Gen: Post-Training Diffusion Acceleration with Deeply Compressed Latent Space
DC-Gen: Post-Training Diffusion Acceleration with Deeply Compressed Latent Space
Wenkun He
Yuchao Gu
Junyu Chen
Dongyun Zou
Yujun Lin
...
Jincheng Yu
Junsong Chen
Enze Xie
Song Han
Han Cai
209
2
0
29 Sep 2025
AToken: A Unified Tokenizer for Vision
AToken: A Unified Tokenizer for Vision
Jiasen Lu
Liangchen Song
Mingze Xu
Byeongjoo Ahn
Yanjun Wang
Chen Chen
Afshin Dehghan
Yinfei Yang
ViT
216
7
0
17 Sep 2025
GPSToken: Gaussian Parameterized Spatially-adaptive Tokenization for Image Representation and Generation
GPSToken: Gaussian Parameterized Spatially-adaptive Tokenization for Image Representation and Generation
Zhengqiang Zhang
Rongyuan Wu
Lingchen Sun
Lei Zhang
257
2
0
01 Sep 2025
DC-AE 1.5: Accelerating Diffusion Model Convergence with Structured Latent Space
DC-AE 1.5: Accelerating Diffusion Model Convergence with Structured Latent Space
Junyu Chen
Dongyun Zou
Wenkun He
Junsong Chen
Enze Xie
Song Han
Han Cai
160
15
0
01 Aug 2025
Latent Denoising Makes Good Visual Tokenizers
Latent Denoising Makes Good Visual Tokenizers
Jiawei Yang
Tianhong Li
Lijie Fan
Yonglong Tian
Yue Wang
157
13
0
21 Jul 2025
Rethinking Discrete Tokens: Treating Them as Conditions for Continuous Autoregressive Image Synthesis
Rethinking Discrete Tokens: Treating Them as Conditions for Continuous Autoregressive Image Synthesis
Peng Zheng
Junke Wang
Y. Chang
Yizhou Yu
Rui Ma
Zuxuan Wu
291
3
0
02 Jul 2025
VFRTok: Variable Frame Rates Video Tokenizer with Duration-Proportional Information Assumption
VFRTok: Variable Frame Rates Video Tokenizer with Duration-Proportional Information Assumption
Tianxiong Zhong
Xingye Tian
Boyuan Jiang
Xuebo Wang
Xin Tao
Pengfei Wan
Zhiwei Zhang
269
2
0
17 May 2025
GigaTok: Scaling Visual Tokenizers to 3 Billion Parameters for Autoregressive Image Generation
GigaTok: Scaling Visual Tokenizers to 3 Billion Parameters for Autoregressive Image Generation
Tianwei Xiong
Jun Hao Liew
Zilong Huang
Jiashi Feng
Xihui Liu
332
19
0
11 Apr 2025
Dual Codebook VQ: Enhanced Image Reconstruction with Reduced Codebook Size
Parisa Boodaghi Malidarreh
Jillur Rahman Saurav
T. Pham
Amir Hajighasemi
Anahita Samadi
Saurabh Shrinivas Maydeo
M. Nasr
Jacob M. Luber
219
0
0
13 Mar 2025
Robust Latent Matters: Boosting Image Generation with Sampling Error Synthesis
Robust Latent Matters: Boosting Image Generation with Sampling Error Synthesis
Kai Qiu
Xianrui Li
Jason Kuen
Zeyang Zhang
Xiaohao Xu
Jiuxiang Gu
Yinyi Luo
Bhiksha Raj
Zhe Lin
Marios Savvides
487
6
0
11 Mar 2025
Next Patch Prediction for Autoregressive Visual Generation
Next Patch Prediction for Autoregressive Visual Generation
Yatian Pang
Peng Jin
Shuo Yang
Bin Lin
Bin Zhu
...
Liuhan Chen
Francis E. H. Tay
Ser-Nam Lim
Harry Yang
Li Yuan
563
20
0
19 Dec 2024
XQ-GAN: An Open-source Image Tokenization Framework for Autoregressive
  Generation
XQ-GAN: An Open-source Image Tokenization Framework for Autoregressive Generation
Xianrui Li
Kai Qiu
Zeyang Zhang
Jason Kuen
Jiuxiang Gu
Jiadong Wang
Zhe Lin
Bhiksha Raj
VLM
405
12
0
02 Dec 2024
Addressing Representation Collapse in Vector Quantized Models with One Linear Layer
Addressing Representation Collapse in Vector Quantized Models with One Linear Layer
Yongxin Zhu
Bing Li
Yifei Xin
Zhihua Xia
Linli Xu
461
40
0
04 Nov 2024
Randomized Autoregressive Visual Generation
Randomized Autoregressive Visual Generation
Qihang Yu
Ju He
XueQing Deng
Xiaohui Shen
Liang-Chieh Chen
VGenDiffM
306
83
1
01 Nov 2024
WorldSimBench: Towards Video Generation Models as World Simulators
WorldSimBench: Towards Video Generation Models as World Simulators
Yiran Qin
Zhelun Shi
Jiwen Yu
Xijun Wang
Enshen Zhou
...
Lu Sheng
Jing Shao
Junlin Wu
Wanli Ouyang
Ruimao Zhang
EGVMVGen
500
779
0
23 Oct 2024
Fluid: Scaling Autoregressive Text-to-image Generative Models with
  Continuous Tokens
Fluid: Scaling Autoregressive Text-to-image Generative Models with Continuous TokensInternational Conference on Learning Representations (ICLR), 2024
Lijie Fan
Tianhong Li
Siyang Qin
Yuanzhen Li
Chen Sun
Michael Rubinstein
Deqing Sun
Kaiming He
Yonglong Tian
VLMDiffM
303
109
0
17 Oct 2024
Customize Your Visual Autoregressive Recipe with Set Autoregressive
  Modeling
Customize Your Visual Autoregressive Recipe with Set Autoregressive Modeling
Wenze Liu
Le Zhuo
Yi Xin
Sheng Xia
Peng Gao
Xiangyu Yue
207
16
0
14 Oct 2024
Representation Alignment for Generation: Training Diffusion Transformers Is Easier Than You Think
Representation Alignment for Generation: Training Diffusion Transformers Is Easier Than You ThinkInternational Conference on Learning Representations (ICLR), 2024
Sihyun Yu
Sangkyung Kwak
Huiwon Jang
Jongheon Jeong
Jonathan Huang
Jinwoo Shin
Saining Xie
OCL
658
276
0
09 Oct 2024
Restructuring Vector Quantization with the Rotation Trick
Restructuring Vector Quantization with the Rotation TrickInternational Conference on Learning Representations (ICLR), 2024
Christopher Fifty
Ronald G. Junkins
Dennis Duan
Aniketh Iger
Jerry W. Liu
Ehsan Amid
Sebastian Thrun
Christopher Ré
LLMSV
448
33
0
08 Oct 2024
ImageFolder: Autoregressive Image Generation with Folded Tokens
ImageFolder: Autoregressive Image Generation with Folded TokensInternational Conference on Learning Representations (ICLR), 2024
Xiang Li
Kai Qiu
Hao Chen
Jason Kuen
Jiuxiang Gu
Bhiksha Raj
Zhe Lin
VLM
266
57
0
02 Oct 2024
Emu3: Next-Token Prediction is All You Need
Emu3: Next-Token Prediction is All You Need
Xinlong Wang
Xiaosong Zhang
Zhengxiong Luo
Quan-Sen Sun
Yufeng Cui
...
Xi Yang
Jingjing Liu
Yonghua Lin
Tiejun Huang
Zhongyuan Wang
MLLM
262
459
0
27 Sep 2024
MaskBit: Embedding-free Image Generation via Bit Tokens
MaskBit: Embedding-free Image Generation via Bit Tokens
Mark Weber
Lijun Yu
Qihang Yu
XueQing Deng
Xiaohui Shen
Daniel Cremers
Liang-Chieh Chen
DiffM
198
70
0
24 Sep 2024
Open-MAGVIT2: An Open-Source Project Toward Democratizing Auto-regressive Visual Generation
Open-MAGVIT2: An Open-Source Project Toward Democratizing Auto-regressive Visual Generation
Zhuoyan Luo
Fengyuan Shi
Yixiao Ge
Yujiu Yang
Limin Wang
Ying Shan
VLM
558
100
0
06 Sep 2024
Show-o: One Single Transformer to Unify Multimodal Understanding and
  Generation
Show-o: One Single Transformer to Unify Multimodal Understanding and GenerationInternational Conference on Learning Representations (ICLR), 2024
Jinheng Xie
Weijia Mao
Zechen Bai
David Junhao Zhang
Weihao Wang
Kevin Qinghong Lin
Yuchao Gu
Zhijie Chen
Zhenheng Yang
Mike Zheng Shou
368
428
0
22 Aug 2024
Scalable Autoregressive Image Generation with Mamba
Scalable Autoregressive Image Generation with Mamba
Haopeng Li
Jinyue Yang
Kexin Wang
Xuerui Qiu
Yuhong Chou
Xin Li
Guoqi Li
Mamba
506
24
0
22 Aug 2024
Transfusion: Predict the Next Token and Diffuse Images with One
  Multi-Modal Model
Transfusion: Predict the Next Token and Diffuse Images with One Multi-Modal Model
Chunting Zhou
Lili Yu
Arun Babu
Kushal Tirumala
Michihiro Yasunaga
Leonid Shamis
Jacob Kahn
Xuezhe Ma
Luke Zettlemoyer
Omer Levy
DiffM
259
286
0
20 Aug 2024
TokenPacker: Efficient Visual Projector for Multimodal LLM
TokenPacker: Efficient Visual Projector for Multimodal LLM
Wentong Li
Yuqian Yuan
Jian Liu
Dongqi Tang
Song Wang
Jie Qin
Jianke Zhu
Lei Zhang
MLLM
440
119
0
02 Jul 2024
Autoregressive Image Generation without Vector Quantization
Autoregressive Image Generation without Vector Quantization
Tianhong Li
Yonglong Tian
He Li
Mingyang Deng
Kaiming He
DiffM
428
466
0
17 Jun 2024
Scaling the Codebook Size of VQGAN to 100,000 with a Utilization Rate of
  99%
Scaling the Codebook Size of VQGAN to 100,000 with a Utilization Rate of 99%
Lei Zhu
Fangyun Wei
Yanye Lu
Dong Chen
VLM
219
64
0
17 Jun 2024
An Image is Worth 32 Tokens for Reconstruction and Generation
An Image is Worth 32 Tokens for Reconstruction and Generation
Qihang Yu
Mark Weber
XueQing Deng
Xiaohui Shen
Daniel Cremers
Liang-Chieh Chen
VLMViT
363
182
0
11 Jun 2024
Autoregressive Model Beats Diffusion: Llama for Scalable Image
  Generation
Autoregressive Model Beats Diffusion: Llama for Scalable Image Generation
Peize Sun
Yi Jiang
Shoufa Chen
Shilong Zhang
Bingyue Peng
Ping Luo
Zehuan Yuan
VLM
489
523
0
10 Jun 2024
Applying Guidance in a Limited Interval Improves Sample and Distribution
  Quality in Diffusion Models
Applying Guidance in a Limited Interval Improves Sample and Distribution Quality in Diffusion Models
Tuomas Kynkaanniemi
M. Aittala
Tero Karras
S. Laine
Timo Aila
J. Lehtinen
197
150
0
11 Apr 2024
Visual Autoregressive Modeling: Scalable Image Generation via Next-Scale
  Prediction
Visual Autoregressive Modeling: Scalable Image Generation via Next-Scale PredictionNeural Information Processing Systems (NeurIPS), 2024
Keyu Tian
Yi Jiang
Zehuan Yuan
Zehuan Yuan
Liwei Wang
VGen
395
691
0
03 Apr 2024
Mini-Gemini: Mining the Potential of Multi-modality Vision Language
  Models
Mini-Gemini: Mining the Potential of Multi-modality Vision Language Models
Yanwei Li
Yuechen Zhang
Chengyao Wang
Zhisheng Zhong
Yixin Chen
Ruihang Chu
Shaoteng Liu
Jiaya Jia
VLMMLLMMoE
378
323
0
27 Mar 2024
Rotary Position Embedding for Vision Transformer
Rotary Position Embedding for Vision Transformer
Byeongho Heo
Song Park
Dongyoon Han
Sangdoo Yun
383
123
0
20 Mar 2024
Codebook Transfer with Part-of-Speech for Vector-Quantized Image
  Modeling
Codebook Transfer with Part-of-Speech for Vector-Quantized Image ModelingComputer Vision and Pattern Recognition (CVPR), 2024
Baoquan Zhang
Huaibin Wang
Chuyao Luo
Xutao Li
Guotao Liang
Yunming Ye
Xiaochen Qi
Yao He
231
13
0
15 Mar 2024
An Image is Worth 1/2 Tokens After Layer 2: Plug-and-Play Inference
  Acceleration for Large Vision-Language Models
An Image is Worth 1/2 Tokens After Layer 2: Plug-and-Play Inference Acceleration for Large Vision-Language ModelsEuropean Conference on Computer Vision (ECCV), 2024
Liang Chen
Haozhe Zhao
Tianyu Liu
Shuai Bai
Junyang Lin
Chang Zhou
Baobao Chang
MLLMVLM
315
318
0
11 Mar 2024
Feast Your Eyes: Mixture-of-Resolution Adaptation for Multimodal Large
  Language Models
Feast Your Eyes: Mixture-of-Resolution Adaptation for Multimodal Large Language Models
Gen Luo
Weihao Ye
Yuxin Zhang
Xiawu Zheng
Xiaoshuai Sun
Rongrong Ji
VLM
218
96
0
05 Mar 2024
Fast Timing-Conditioned Latent Audio Diffusion
Fast Timing-Conditioned Latent Audio Diffusion
Zach Evans
CJ Carr
Josiah Taylor
Scott H. Hawley
Jordi Pons
DiffM
494
191
0
07 Feb 2024
Lumiere: A Space-Time Diffusion Model for Video Generation
Lumiere: A Space-Time Diffusion Model for Video GenerationACM SIGGRAPH Conference and Exhibition on Computer Graphics and Interactive Techniques in Asia (SIGGRAPH Asia), 2024
Omer Bar-Tal
Hila Chefer
Omer Tov
Charles Herrmann
Roni Paiss
...
T. Michaeli
Oliver Wang
Deqing Sun
Tali Dekel
Inbar Mosseri
VGen
365
367
0
23 Jan 2024
SiT: Exploring Flow and Diffusion-based Generative Models with Scalable
  Interpolant Transformers
SiT: Exploring Flow and Diffusion-based Generative Models with Scalable Interpolant TransformersEuropean Conference on Computer Vision (ECCV), 2024
Nanye Ma
Mark Goldstein
M. S. Albergo
Nicholas M. Boffi
Eric Vanden-Eijnden
Saining Xie
DiffM
364
411
0
16 Jan 2024
GIVT: Generative Infinite-Vocabulary Transformers
GIVT: Generative Infinite-Vocabulary TransformersEuropean Conference on Computer Vision (ECCV), 2023
Michael Tschannen
Cian Eastwood
Fabian Mentzer
326
62
0
04 Dec 2023
Stable Video Diffusion: Scaling Latent Video Diffusion Models to Large
  Datasets
Stable Video Diffusion: Scaling Latent Video Diffusion Models to Large Datasets
A. Blattmann
Tim Dockhorn
Sumith Kulal
Daniel Mendelevitch
Maciej Kilian
...
Zion English
Vikram S. Voleti
Adam Letts
Varun Jampani
Robin Rombach
VGen
901
1,901
0
25 Nov 2023
SPHINX: The Joint Mixing of Weights, Tasks, and Visual Embeddings for
  Multi-modal Large Language Models
SPHINX: The Joint Mixing of Weights, Tasks, and Visual Embeddings for Multi-modal Large Language Models
Ziyi Lin
Chris Liu
Renrui Zhang
Shiyang Feng
Longtian Qiu
...
Siyuan Huang
Yichi Zhang
Xuming He
Jiaming Song
Yu Qiao
MLLMVLM
280
272
0
13 Nov 2023
123
Next