Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales

Terms and Conditions

Twitter GitHub LinkedIn Bluesky Youtube

© 2026 ResearchTrend.AI, All rights reserved.

Home
Papers
2412.10958
Cited By

SoftVQ-VAE: Efficient 1-Dimensional Continuous Tokenizer

v1v2v3 (latest)

SoftVQ-VAE: Efficient 1-Dimensional Continuous Tokenizer

Computer Vision and Pattern Recognition (CVPR), 2024

14 December 2024

ArXiv (abs)PDF HTML HuggingFace (1 upvotes)

Papers citing "SoftVQ-VAE: Efficient 1-Dimensional Continuous Tokenizer"

50 / 116 papers shown

Toward Diffusible High-Dimensional Latent Spaces: A Frequency Perspective

Toward Diffusible High-Dimensional Latent Spaces: A Frequency Perspective

Saketh Rambhatla

133

0

0

27 Nov 2025

Decoupling Complexity from Scale in Latent Diffusion Model

Tianxiong Zhong

317

1

0

20 Nov 2025

VAEVQ: Enhancing Discrete Visual Tokenization through Variational Modeling

VAEVQ: Enhancing Discrete Visual Tokenization through Variational Modeling

196

0

0

10 Nov 2025

VALA: Learning Latent Anchors for Training-Free and Temporally Consistent

VALA: Learning Latent Anchors for Training-Free and Temporally Consistent

125

0

0

27 Oct 2025

UniFlow: A Unified Pixel Flow Tokenizer for Visual Understanding and Generation

UniFlow: A Unified Pixel Flow Tokenizer for Visual Understanding and Generation

...

185

7

0

12 Oct 2025

SSDD: Single-Step Diffusion Decoder for Efficient Image Tokenization

SSDD: Single-Step Diffusion Decoder for Efficient Image Tokenization

Théophane Vallaeys

228

3

0

06 Oct 2025

DC-Gen: Post-Training Diffusion Acceleration with Deeply Compressed Latent Space

DC-Gen: Post-Training Diffusion Acceleration with Deeply Compressed Latent Space

...

248

2

0

29 Sep 2025

AToken: A Unified Tokenizer for Vision

AToken: A Unified Tokenizer for Vision

243

7

0

17 Sep 2025

GPSToken: Gaussian Parameterized Spatially-adaptive Tokenization for Image Representation and Generation

GPSToken: Gaussian Parameterized Spatially-adaptive Tokenization for Image Representation and Generation

Zhengqiang Zhang

277

2

0

01 Sep 2025

DC-AE 1.5: Accelerating Diffusion Model Convergence with Structured Latent Space

DC-AE 1.5: Accelerating Diffusion Model Convergence with Structured Latent Space

179

16

0

01 Aug 2025

Latent Denoising Makes Good Visual Tokenizers

Latent Denoising Makes Good Visual Tokenizers

192

13

0

21 Jul 2025

Rethinking Discrete Tokens: Treating Them as Conditions for Continuous Autoregressive Image Synthesis

Rethinking Discrete Tokens: Treating Them as Conditions for Continuous Autoregressive Image Synthesis

341

3

0

02 Jul 2025

VFRTok: Variable Frame Rates Video Tokenizer with Duration-Proportional Information Assumption

VFRTok: Variable Frame Rates Video Tokenizer with Duration-Proportional Information Assumption

Tianxiong Zhong

303

2

0

17 May 2025

GigaTok: Scaling Visual Tokenizers to 3 Billion Parameters for Autoregressive Image Generation

GigaTok: Scaling Visual Tokenizers to 3 Billion Parameters for Autoregressive Image Generation

363

22

0

11 Apr 2025

Dual Codebook VQ: Enhanced Image Reconstruction with Reduced Codebook Size

Parisa Boodaghi Malidarreh

Jillur Rahman Saurav

Amir Hajighasemi

Saurabh Shrinivas Maydeo

232

0

0

13 Mar 2025

Robust Latent Matters: Boosting Image Generation with Sampling Error Synthesis

Robust Latent Matters: Boosting Image Generation with Sampling Error Synthesis

Marios Savvides

507

6

0

11 Mar 2025

Next Patch Prediction for Autoregressive Visual Generation

Next Patch Prediction for Autoregressive Visual Generation

...

Francis E. H. Tay

633

21

0

19 Dec 2024

XQ-GAN: An Open-source Image Tokenization Framework for Autoregressive
Generation

XQ-GAN: An Open-source Image Tokenization Framework for Autoregressive Generation

421

12

0

02 Dec 2024

Addressing Representation Collapse in Vector Quantized Models with One Linear Layer

Addressing Representation Collapse in Vector Quantized Models with One Linear Layer

525

42

0

04 Nov 2024

Randomized Autoregressive Visual Generation

Randomized Autoregressive Visual Generation

Ju He

Liang-Chieh Chen

326

84

1

01 Nov 2024

WorldSimBench: Towards Video Generation Models as World Simulators

WorldSimBench: Towards Video Generation Models as World Simulators

Xijun Wang

...

Wanli Ouyang

550

796

0

23 Oct 2024

Fluid: Scaling Autoregressive Text-to-image Generative Models with
Continuous Tokens

Fluid: Scaling Autoregressive Text-to-image Generative Models with Continuous TokensInternational Conference on Learning Representations (ICLR), 2024

Yuanzhen Li

Michael Rubinstein

330

110

0

17 Oct 2024

Customize Your Visual Autoregressive Recipe with Set Autoregressive
Modeling

Customize Your Visual Autoregressive Recipe with Set Autoregressive Modeling

Xiangyu Yue

226

16

0

14 Oct 2024

Representation Alignment for Generation: Training Diffusion Transformers Is Easier Than You Think

Representation Alignment for Generation: Training Diffusion Transformers Is Easier Than You ThinkInternational Conference on Learning Representations (ICLR), 2024

712

292

0

09 Oct 2024

Restructuring Vector Quantization with the Rotation Trick

Restructuring Vector Quantization with the Rotation TrickInternational Conference on Learning Representations (ICLR), 2024

Christopher Fifty

Ronald G. Junkins

Sebastian Thrun

Christopher Ré

514

35

0

08 Oct 2024

ImageFolder: Autoregressive Image Generation with Folded Tokens

ImageFolder: Autoregressive Image Generation with Folded TokensInternational Conference on Learning Representations (ICLR), 2024

Xiang Li

Kai Qiu

Bhiksha Raj

298

63

0

02 Oct 2024

Emu3: Next-Token Prediction is All You Need

Emu3: Next-Token Prediction is All You Need

Xinlong Wang

Xiaosong Zhang

...

Xi Yang

Jingjing Liu

Zhongyuan Wang

290

483

0

27 Sep 2024

MaskBit: Embedding-free Image Generation via Bit Tokens

MaskBit: Embedding-free Image Generation via Bit Tokens

Daniel Cremers

Liang-Chieh Chen

213

71

0

24 Sep 2024

Open-MAGVIT2: An Open-Source Project Toward Democratizing Auto-regressive Visual Generation

Open-MAGVIT2: An Open-Source Project Toward Democratizing Auto-regressive Visual Generation

Ying Shan

611

103

0

06 Sep 2024

Show-o: One Single Transformer to Unify Multimodal Understanding and
Generation

Show-o: One Single Transformer to Unify Multimodal Understanding and GenerationInternational Conference on Learning Representations (ICLR), 2024

David Junhao Zhang

Weihao Wang

Kevin Qinghong Lin

Zhijie Chen

Zhenheng Yang

Mike Zheng Shou

400

439

0

22 Aug 2024

Scalable Autoregressive Image Generation with Mamba

Scalable Autoregressive Image Generation with Mamba

550

25

0

22 Aug 2024

Transfusion: Predict the Next Token and Diffuse Images with One
Multi-Modal Model

Transfusion: Predict the Next Token and Diffuse Images with One Multi-Modal Model

Lili Yu

Kushal Tirumala

Michihiro Yasunaga

Luke Zettlemoyer

265

291

0

20 Aug 2024

TokenPacker: Efficient Visual Projector for Multimodal LLM

TokenPacker: Efficient Visual Projector for Multimodal LLM

Jian Liu

477

122

0

02 Jul 2024

Autoregressive Image Generation without Vector Quantization

Autoregressive Image Generation without Vector Quantization

483

480

0

17 Jun 2024

Scaling the Codebook Size of VQGAN to 100,000 with a Utilization Rate of
99%

Scaling the Codebook Size of VQGAN to 100,000 with a Utilization Rate of 99%

Lei Zhu

227

63

0

17 Jun 2024

An Image is Worth 32 Tokens for Reconstruction and Generation

An Image is Worth 32 Tokens for Reconstruction and Generation

Daniel Cremers

Liang-Chieh Chen

391

187

0

11 Jun 2024

Autoregressive Model Beats Diffusion: Llama for Scalable Image
Generation

Autoregressive Model Beats Diffusion: Llama for Scalable Image Generation

Yi Jiang

Bingyue Peng

543

540

0

10 Jun 2024

Applying Guidance in a Limited Interval Improves Sample and Distribution
Quality in Diffusion Models

Applying Guidance in a Limited Interval Improves Sample and Distribution Quality in Diffusion Models

Tuomas Kynkaanniemi

215

154

0

11 Apr 2024

Visual Autoregressive Modeling: Scalable Image Generation via Next-Scale
Prediction

Visual Autoregressive Modeling: Scalable Image Generation via Next-Scale PredictionNeural Information Processing Systems (NeurIPS), 2024

410

715

0

03 Apr 2024

Mini-Gemini: Mining the Potential of Multi-modality Vision Language
Models

Mini-Gemini: Mining the Potential of Multi-modality Vision Language Models

409

325

0

27 Mar 2024

Rotary Position Embedding for Vision Transformer

Rotary Position Embedding for Vision Transformer

435

131

0

20 Mar 2024

Codebook Transfer with Part-of-Speech for Vector-Quantized Image
Modeling

Codebook Transfer with Part-of-Speech for Vector-Quantized Image ModelingComputer Vision and Pattern Recognition (CVPR), 2024

263

15

0

15 Mar 2024

An Image is Worth 1/2 Tokens After Layer 2: Plug-and-Play Inference
Acceleration for Large Vision-Language Models

An Image is Worth 1/2 Tokens After Layer 2: Plug-and-Play Inference Acceleration for Large Vision-Language ModelsEuropean Conference on Computer Vision (ECCV), 2024

Shuai Bai

Chang Zhou

Baobao Chang

337

328

0

11 Mar 2024

Feast Your Eyes: Mixture-of-Resolution Adaptation for Multimodal Large
Language Models

Feast Your Eyes: Mixture-of-Resolution Adaptation for Multimodal Large Language Models

236

98

0

05 Mar 2024

Fast Timing-Conditioned Latent Audio Diffusion

Fast Timing-Conditioned Latent Audio Diffusion

Scott H. Hawley

514

192

0

07 Feb 2024

Lumiere: A Space-Time Diffusion Model for Video Generation

Lumiere: A Space-Time Diffusion Model for Video GenerationACM SIGGRAPH Conference and Exhibition on Computer Graphics and Interactive Techniques in Asia (SIGGRAPH Asia), 2024

Charles Herrmann

...

401

381

0

23 Jan 2024

SiT: Exploring Flow and Diffusion-based Generative Models with Scalable
Interpolant Transformers

SiT: Exploring Flow and Diffusion-based Generative Models with Scalable Interpolant TransformersEuropean Conference on Computer Vision (ECCV), 2024

Nicholas M. Boffi

Eric Vanden-Eijnden

375

423

0

16 Jan 2024

GIVT: Generative Infinite-Vocabulary Transformers

GIVT: Generative Infinite-Vocabulary TransformersEuropean Conference on Computer Vision (ECCV), 2023

Michael Tschannen

369

63

0

04 Dec 2023

Stable Video Diffusion: Scaling Latent Video Diffusion Models to Large
Datasets

Stable Video Diffusion: Scaling Latent Video Diffusion Models to Large Datasets

Daniel Mendelevitch

...

Vikram S. Voleti

973

1,953

0

25 Nov 2023

SPHINX: The Joint Mixing of Weights, Tasks, and Visual Embeddings for
Multi-modal Large Language Models

SPHINX: The Joint Mixing of Weights, Tasks, and Visual Embeddings for Multi-modal Large Language Models

...

Yu Qiao

309

275

0

13 Nov 2023