Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2309.15505
Cited By
Finite Scalar Quantization: VQ-VAE Made Simple
27 September 2023
Fabian Mentzer
David C. Minnen
E. Agustsson
Michael Tschannen
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Finite Scalar Quantization: VQ-VAE Made Simple"
50 / 118 papers shown
Title
Latent Behavior Diffusion for Sequential Reaction Generation in Dyadic Setting
Minh-Duc Nguyen
Hyung-Jeong Yang
Soo-Hyung Kim
Ji-Eun Shin
Seung-Won Kim
DiffM
21
0
0
12 May 2025
Continuous Visual Autoregressive Generation via Score Maximization
Chenze Shao
Fandong Meng
Jie Zhou
DiffM
21
0
0
12 May 2025
ReactDance: Progressive-Granular Representation for Long-Term Coherent Reactive Dance Generation
Jingzhong Lin
Yuanyuan Qi
Xinru Li
Wenxuan Huang
Xiangfeng Xu
Bangyan Li
Xuejiao Wang
Gaoqi He
24
0
0
08 May 2025
LLaMA-Omni2: LLM-based Real-time Spoken Chatbot with Autoregressive Streaming Speech Synthesis
Qingkai Fang
Yan Zhou
Shoutao Guo
Shaolei Zhang
Yang Feng
AuLLM
51
0
0
05 May 2025
FlowDubber: Movie Dubbing with LLM-based Semantic-aware Learning and Flow Matching based Voice Enhancing
Gaoxiang Cong
Liang-Sheng Li
Jiadong Pan
Zhedong Zhang
Amin Beheshti
A. Hengel
Yuankai Qi
Qingming Huang
70
0
0
02 May 2025
Distilling semantically aware orders for autoregressive image generation
Rishav Pramanik
Antoine Poupon
Juan A. Rodriguez
Masih Aminbeidokhti
David Vazquez
Christopher Pal
Zhaozheng Yin
M. Pedersoli
26
0
0
23 Apr 2025
Lightweight Road Environment Segmentation using Vector Quantization
Jiyong Kwag
Alper Yilmaz
Charles Toth
24
0
0
19 Apr 2025
TerraMind: Large-Scale Generative Multimodality for Earth Observation
Johannes Jakubik
Felix Yang
Benedikt Blumenstiel
Erik Scheurer
Rocco Sedona
...
P. Fraccaro
Thomas Brunschwiler
Gabriele Cavallaro
Juan Bernabé-Moreno
Nicolas Longepe
MLLM
VLM
57
2
0
15 Apr 2025
On the Design of Diffusion-based Neural Speech Codecs
Pietro Foti
Andreas Brendel
DiffM
34
0
0
11 Apr 2025
A Streamable Neural Audio Codec with Residual Scalar-Vector Quantization for Real-Time Communication
Xiao-Hang Jiang
Yang Ai
Rui Zheng
Zhen-Hua Ling
31
0
0
09 Apr 2025
One Quantizer is Enough: Toward a Lightweight Audio Codec
Linwei Zhai
H. Ding
Cui Zhao
Fei-Yue Wang
Ge Wang
Wang Zhi
Wei Xi
MQ
27
0
0
07 Apr 2025
MergeVQ: A Unified Framework for Visual Generation and Representation with Disentangled Token Merging and Quantization
Siyuan Li
L. Zhang
Zedong Wang
Juanxi Tian
Cheng Tan
...
Chang Yu
Qingsong Xie
Haonan Lu
Haoqian Wang
Zhen Lei
46
0
0
01 Apr 2025
VTD-CLIP: Video-to-Text Discretization via Prompting CLIP
Wencheng Zhu
Yuexin Wang
Hongxuan Li
Pengfei Zhu
Q. Hu
CLIP
48
0
0
24 Mar 2025
CODA: Repurposing Continuous VAEs for Discrete Tokenization
Zeyu Liu
Zanlin Ni
Yeguo Hua
Xin Deng
Xiao Ma
Cheng Zhong
Gao Huang
42
0
0
22 Mar 2025
Zero-Shot Styled Text Image Generation, but Make It Autoregressive
Vittorio Pippi
Fabio Quattrini
S. Cascianelli
Alessio Tonioni
Rita Cucchiara
37
0
0
21 Mar 2025
Halton Scheduler For Masked Generative Image Transformer
Victor Besnier
Mickael Chen
David Hurych
Eduardo Valle
Matthieu Cord
47
1
0
21 Mar 2025
Bridging Continuous and Discrete Tokens for Autoregressive Visual Generation
Y. Wang
Zhijie Lin
Yao Teng
Yuanzhi Zhu
Shuhuai Ren
Jiashi Feng
Xihui Liu
46
0
0
20 Mar 2025
Tokenize Image as a Set
Zigang Geng
Mengde Xu
Han Hu
Shuyang Gu
DiffM
48
0
0
20 Mar 2025
QINCODEC: Neural Audio Compression with Implicit Neural Codebooks
Zineb Lahrichi
Gaëtan Hadjeres
Gaël Richard
Geoffroy Peeters
42
0
0
19 Mar 2025
Quantization-Free Autoregressive Action Transformer
Ziyad Sheebaelhamd
Michael Tschannen
Michael Muehlebach
Claire Vernade
38
0
0
18 Mar 2025
Versatile Physics-based Character Control with Hybrid Latent Representation
Jinseok Bae
Jungdam Won
Donggeun Lim
I. Hwang
Y. Kim
39
0
0
17 Mar 2025
Flow to the Mode: Mode-Seeking Diffusion Autoencoders for State-of-the-Art Image Tokenization
Kyle Sargent
Kyle Hsu
Justin Johnson
L. Fei-Fei
Jiajun Wu
DiffM
MU
53
2
0
14 Mar 2025
Robust Latent Matters: Boosting Image Generation with Sampling Error Synthesis
Kai Qiu
X. Li
Jason Kuen
H. Chen
Xiaohao Xu
Jiuxiang Gu
Yinyi Luo
Bhiksha Raj
Zhe-nan Lin
Marios Savvides
55
0
0
11 Mar 2025
NFIG: Autoregressive Image Generation with Next-Frequency Prediction
Zhihao Huang
Xi Qiu
Yukuo Ma
Yifu Zhou
Chi Zhang
Xuelong Li
VLM
61
1
0
10 Mar 2025
V2Flow: Unifying Visual Tokenization and Large Language Model Vocabularies for Autoregressive Image Generation
Guiwei Zhang
Tianyu Zhang
Mohan Zhou
Yalong Bai
Biye Li
59
0
0
10 Mar 2025
BTFL: A Bayesian-based Test-Time Generalization Method for Internal and External Data Distributions in Federated learning
Yu Zhou
Bingyan Liu
FedML
OOD
TTA
49
0
0
09 Mar 2025
Frequency Autoregressive Image Generation with Continuous Tokens
Hu Yu
Hao Luo
Hangjie Yuan
Yu Rong
Feng Zhao
VGen
37
2
0
07 Mar 2025
Discrete Contrastive Learning for Diffusion Policies in Autonomous Driving
Kalle Kujanpää
Daulet Baimukashev
Farzeen Munir
Shoaib Azam
Tomasz Piotr Kucner
J. Pajarinen
Ville Kyrki
33
0
0
07 Mar 2025
Spark-TTS: An Efficient LLM-Based Text-to-Speech Model with Single-Stream Decoupled Speech Tokens
X. Wang
Mingqi Jiang
Z. Ma
Ziyu Zhang
S. Liu
...
Zhifei Li
Xie Chen
Lei Xie
Y. Guo
Wei Xue
73
10
0
03 Mar 2025
CAPS: Context-Aware Priority Sampling for Enhanced Imitation Learning in Autonomous Driving
Hamidreza Mirkhani
Behzad Khamidehi
Ehsan Ahmadi
Fazel Arasteh
Mohammed Elmahgiubi
Weize Zhang
Umar Rajguru
Kasra Rezaee
52
0
0
03 Mar 2025
DLF: Extreme Image Compression with Dual-generative Latent Fusion
Naifu Xue
Zhaoyang Jia
Jiahao Li
Bin Li
Yuan Zhang
Yan-Heng Lu
48
1
0
03 Mar 2025
Towards High-fidelity 3D Talking Avatar with Personalized Dynamic Texture
X. Li
Jianyu Wang
Yuhao Cheng
Yikun Zeng
X. Ren
W. Zhu
Weiming Zhao
Yichao Yan
31
0
0
01 Mar 2025
Discrete Codebook World Models for Continuous Control
Aidan Scannell
Mohammadreza Nakhaei
Kalle Kujanpää
Yi Zhao
Kevin Sebastian Luck
Arno Solin
J. Pajarinen
OffRL
47
0
0
01 Mar 2025
Projection Head is Secretly an Information Bottleneck
Zhuo Ouyang
Kaiwen Hu
Qi Zhang
Yifei Wang
Yisen Wang
37
0
0
01 Mar 2025
Protein Structure Tokenization: Benchmarking and New Recipe
Xinyu Yuan
Zichen Wang
Marcus Collins
Huzefa Rangwala
36
0
0
28 Feb 2025
VaViM and VaVAM: Autonomous Driving through Video Generative Modeling
Florent Bartoccioni
Elias Ramzi
Victor Besnier
Shashanka Venkataramanan
Tuan-Hung Vu
...
Mickael Chen
Éloi Zablocki
Andrei Bursuc
Eduardo Valle
Matthieu Cord
VGen
78
1
0
24 Feb 2025
From Principles to Applications: A Comprehensive Survey of Discrete Tokenizers in Generation, Comprehension, Recommendation, and Information Retrieval
Jian Jia
Jingtong Gao
Ben Xue
Junhao Wang
Qingpeng Cai
Quan Chen
Xiangyu Zhao
Peng Jiang
Kun Gai
OffRL
67
0
0
18 Feb 2025
SyncSpeech: Low-Latency and Efficient Dual-Stream Text-to-Speech based on Temporal Masked Transformer
Zhengyan Sheng
Zhihao Du
Shiliang Zhang
Zhijie Yan
Yexin Yang
Zhenhua Ling
49
1
0
16 Feb 2025
The Case for Cleaner Biosignals: High-fidelity Neural Compressor Enables Transfer from Cleaner iEEG to Noisier EEG
Francesco Stefano Carzaniga
Gary Tom Hoppeler
Michael Hersche
Kaspar Anton Schindler
Abbas Rahimi
38
0
0
10 Feb 2025
IndexTTS: An Industrial-Level Controllable and Efficient Zero-Shot Text-To-Speech System
Wei Deng
Siyi Zhou
Jingchen Shu
Jinchao Wang
Lu Wang
VLM
42
1
0
08 Feb 2025
Koel-TTS: Enhancing LLM based Speech Generation with Preference Alignment and Classifier Free Guidance
Shehzeen Samarah Hussain
Paarth Neekhara
Xuesong Yang
Edresson Casanova
Subhankar Ghosh
Mikyas T. Desta
Roy Fejgin
Rafael Valle
Jason Chun Lok Li
59
2
0
07 Feb 2025
AudioMiXR: Spatial Audio Object Manipulation with 6DoF for Sound Design in Augmented Reality
Brandon Woodard
Margarita Geleta
Joseph J. LaViola Jr.
Andrea Fanelli
Rhonda Wilson
55
2
0
05 Feb 2025
Learning the Language of Protein Structure
Benoit Gaujac
Jérémie Donà
Liviu Copoiu
Timothy Atkinson
Thomas Pierrot
Thomas D. Barrett
51
10
0
08 Jan 2025
CAT: Content-Adaptive Image Tokenization
Junhong Shen
Kushal Tirumala
Michihiro Yasunaga
Ishan Misra
Luke Zettlemoyer
Lili Yu
Chunting Zhou
24
0
0
06 Jan 2025
Advanced Knowledge Transfer: Refined Feature Distillation for Zero-Shot Quantization in Edge Computing
Inpyo Hong
Youngwan Jo
Hyojeong Lee
Sunghyun Ahn
Sanghyun Park
MQ
49
1
0
26 Dec 2024
VidTwin: Video VAE with Decoupled Structure and Dynamics
Yuchi Wang
Junliang Guo
Xinyi Xie
Tianyu He
Xu Sun
Jiang Bian
DRL
VGen
73
3
0
23 Dec 2024
SoftVQ-VAE: Efficient 1-Dimensional Continuous Tokenizer
H. Chen
Z. Wang
X. Li
X. Sun
Fangyi Chen
Jiang Liu
J. Wang
Bhiksha Raj
Zicheng Liu
Emad Barsoum
VLM
106
6
0
14 Dec 2024
Scaling Transformers for Low-Bitrate High-Quality Speech Coding
Julian Parker
Anton Smirnov
Jordi Pons
CJ Carr
Zack Zukowski
Zach Evans
Xubo Liu
75
9
0
29 Nov 2024
3D-WAG: Hierarchical Wavelet-Guided Autoregressive Generation for High-Fidelity 3D Shapes
Tejaswini Medi
Arianna Rampini
Pradyumna Reddy
P. Jayaraman
M. Keuper
DiffM
79
0
0
28 Nov 2024
Continuous Autoregressive Models with Noise Augmentation Avoid Error Accumulation
Marco Pasini
J. Nistal
Stefan Lattner
George Fazekas
61
3
0
27 Nov 2024
1
2
3
Next