Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2309.15505
Cited By
Finite Scalar Quantization: VQ-VAE Made Simple
27 September 2023
Fabian Mentzer
David C. Minnen
E. Agustsson
Michael Tschannen
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Finite Scalar Quantization: VQ-VAE Made Simple"
50 / 118 papers shown
Title
Factorized Visual Tokenization and Generation
Zechen Bai
Jianxiong Gao
Ziteng Gao
Pichao Wang
Zheng Zhang
Tong He
Mike Zheng Shou
64
3
0
25 Nov 2024
Representation Collapsing Problems in Vector Quantization
Wenhao Zhao
Qiran Zou
Rushi Shah
Dianbo Liu
67
1
0
25 Nov 2024
Extending Video Masked Autoencoders to 128 frames
N. B. Gundavarapu
Luke Friedman
Raghav Goyal
Chaitra Hegde
Eirikur Agustsson
...
Mikhail Sirotenko
Ming Yang
Tobias Weyand
Boqing Gong
Leonid Sigal
72
1
0
20 Nov 2024
ESTVocoder: An Excitation-Spectral-Transformed Neural Vocoder Conditioned on Mel Spectrogram
Xiao-Hang Jiang
Hui-Peng Du
Yang Ai
Ye-Xin Lu
Zhen-Hua Ling
28
0
0
18 Nov 2024
ENAT: Rethinking Spatial-temporal Interactions in Token-based Image Synthesis
Zanlin Ni
Yulin Wang
Renping Zhou
Yizeng Han
Jiayi Guo
Zhiyuan Liu
Yuan Yao
Gao Huang
50
4
0
11 Nov 2024
Autoregressive Models in Vision: A Survey
Jing Xiong
Gongye Liu
Lun Huang
Chengyue Wu
Taiqiang Wu
...
M. Zhang
Guillermo Sapiro
Jiebo Luo
Ping Luo
Ngai Wong
VGen
46
9
0
08 Nov 2024
Image Understanding Makes for A Good Tokenizer for Image Generation
Luting Wang
Yang Zhao
Zijian Zhang
Jiashi Feng
Si Liu
Bingyi Kang
VLM
26
4
0
07 Nov 2024
Addressing Representation Collapse in Vector Quantized Models with One Linear Layer
Yongxin Zhu
B. Li
Yifei Xin
Linli Xu
36
10
0
04 Nov 2024
MeToken: Uniform Micro-environment Token Boosts Post-Translational Modification Prediction
Cheng Tan
Zhenxiao Cao
Zhangyang Gao
Lirong Wu
Siyuan Li
Yufei Huang
Jun-Xiong Xia
Bozhen Hu
Stan Z. Li
38
0
0
04 Nov 2024
Fish-Speech: Leveraging Large Language Models for Advanced Multilingual Text-to-Speech Synthesis
Shijia Liao
Y. Wang
Tianyu Li
Yifan Cheng
Ruoyi Zhang
Rongzhi Zhou
Yijin Xing
AuLLM
35
10
0
02 Nov 2024
Optimizing Contextual Speech Recognition Using Vector Quantization for Efficient Retrieval
Nikolaos Flemotomos
Roger Hsiao
P. Swietojanski
Takaaki Hori
Dogan Can
Xiaodan Zhuang
44
0
0
01 Nov 2024
Identifying Spatio-Temporal Drivers of Extreme Events
Mohamad Hakam Shams Eddin
Juergen Gall
AI4TS
48
0
0
31 Oct 2024
Disentangling Disentangled Representations: Towards Improved Latent Units via Diffusion Models
Youngjun Jun
Jiwoo Park
Kyobin Choo
Tae Eun Choi
Seong Jae Hwang
CoGe
33
0
0
31 Oct 2024
APCodec+: A Spectrum-Coding-Based High-Fidelity and High-Compression-Rate Neural Audio Codec with Staged Training Paradigm
Hui-Peng Du
Yang Ai
Rui Zheng
Zhen-Hua Ling
33
0
0
30 Oct 2024
Bio2Token: All-atom tokenization of any biomolecular structure with Mamba
Andrew Liu
Axel Elaldi
Nathan Russell
Olivia Viessmann
Mamba
61
2
0
24 Oct 2024
Elucidating the design space of language models for image generation
Xuantong Liu
Shaozhe Hao
Xianbiao Qi
Tianyang Hu
Jun Wang
Rong Xiao
Yuan Yao
VLM
30
3
0
21 Oct 2024
SeisLM: a Foundation Model for Seismic Waveforms
Tianlin Liu
Jannes Münchmeyer
Laura Laurenti
C. Marone
Maarten V. de Hoop
Ivan Dokmanić
VLM
16
4
0
21 Oct 2024
Towards Scalable Semantic Representation for Recommendation
Taolin Zhang
Junwei Pan
J. T. Wang
Yaohua Zha
Tao Dai
...
Xiaoxiang Deng
Yuan Wang
Ming Yue
Jie Jiang
Shu-Tao Xia
42
1
0
12 Oct 2024
ElasticTok: Adaptive Tokenization for Image and Video
Wilson Yan
Matei A. Zaharia
Volodymyr Mnih
Pieter Abbeel
Aleksandra Faust
Hao Liu
VGen
41
6
0
10 Oct 2024
Imitation Learning with Limited Actions via Diffusion Planners and Deep Koopman Controllers
Jianxin Bi
Kelvin Lim
Kaiqi Chen
Yifei Huang
Harold Soh
25
0
0
10 Oct 2024
Vector Grimoire: Codebook-based Shape Generation under Raster Image Supervision
Moritz Feuerpfeil
Marco Cipriano
Gerard de Melo
24
0
0
08 Oct 2024
Restructuring Vector Quantization with the Rotation Trick
Christopher Fifty
Ronald G. Junkins
Dennis Duan
Aniketh Iger
Jerry W. Liu
Ehsan Amid
Sebastian Thrun
Christopher Ré
LLMSV
38
11
0
08 Oct 2024
Zebra: In-Context and Generative Pretraining for Solving Parametric PDEs
Louis Serrano
Armand K. Koupai
Thomas X. Wang
Pierre Erbacher
Patrick Gallinari
AI4CE
26
3
0
04 Oct 2024
Scaling Large Motion Models with Million-Level Human Motions
Ye Wang
Sipeng Zheng
Bin Cao
Qianshan Wei
Qin Jin
Qin Jin
Zongqing Lu
VGen
40
0
0
04 Oct 2024
A Spark of Vision-Language Intelligence: 2-Dimensional Autoregressive Transformer for Efficient Finegrained Image Generation
Liang Chen
Sinan Tan
Zefan Cai
Weichu Xie
Haozhe Zhao
Yichi Zhang
Junyang Lin
Jinze Bai
Tianyu Liu
Baobao Chang
ViT
50
3
0
02 Oct 2024
Restorative Speech Enhancement: A Progressive Approach Using SE and Codec Modules
Hsin-Tien Chiang
Hao Zhang
Yong Xu
Meng Yu
Dong Yu
23
1
0
02 Oct 2024
Denoising with a Joint-Embedding Predictive Architecture
Dengsheng Chen
Jie Hu
Xiaoming Wei
Enhua Wu
DiffM
47
2
0
02 Oct 2024
PerCo (SD): Open Perceptual Compression
Nikolai Korber
Eduard Kromer
Andreas Siebert
S. Hauke
Daniel Mueller-Gritschneder
Björn Schuller
19
3
0
30 Sep 2024
Learning Quantized Adaptive Conditions for Diffusion Models
Yuchen Liang
Yuchuan Tian
Lei Yu
Huao Tang
Jie Hu
Xiangzhong Fang
Hanting Chen
DiffM
32
0
0
26 Sep 2024
EMOVA: Empowering Language Models to See, Hear and Speak with Vivid Emotions
Kai Chen
Yunhao Gou
Runhui Huang
Zhili Liu
Daxin Tan
...
Qun Liu
Jun Yao
Lu Hou
Hang Xu
Hang Xu
AuLLM
MLLM
VLM
65
21
0
26 Sep 2024
MaskBit: Embedding-free Image Generation via Bit Tokens
Mark Weber
Lijun Yu
Qihang Yu
XueQing Deng
Xiaohui Shen
Daniel Cremers
Liang-Chieh Chen
DiffM
49
30
0
24 Sep 2024
Using High-Level Patterns to Estimate How Humans Predict a Robot will Behave
Sagar Parekh
Lauren Bramblett
N. Bezzo
Dylan P. Losey
32
0
0
20 Sep 2024
MaskGCT: Zero-Shot Text-to-Speech with Masked Generative Codec Transformer
Yuancheng Wang
Haoyue Zhan
Liwei Liu
Ruihong Zeng
Haotian Guo
Jiachen Zheng
Qiang Zhang
Shunsi Zhang
Shunsi Zhang
Zhizheng Wu
23
38
0
01 Sep 2024
AdaNAT: Exploring Adaptive Policy for Token-Based Image Generation
Zanlin Ni
Yulin Wang
Renping Zhou
Rui Lu
Jiayi Guo
Jinyi Hu
Zhiyuan Liu
Yuan Yao
Gao Huang
25
7
0
31 Aug 2024
Lumina-mGPT: Illuminate Flexible Photorealistic Text-to-Image Generation with Multimodal Generative Pretraining
Dongyang Liu
Shitian Zhao
Le Zhuo
Weifeng Lin
Yu Qiao
Xinyue Li
Qi Qin
Yu Qiao
Hongsheng Li
Peng Gao
MLLM
62
48
0
05 Aug 2024
QueST: Self-Supervised Skill Abstractions for Learning Continuous Control
Atharva Mete
Haotian Xue
Albert Wilcox
Yongxin Chen
Animesh Garg
SSL
21
16
0
22 Jul 2024
Quantised Global Autoencoder: A Holistic Approach to Representing Visual Data
Tim Elsner
Paula Usinger
Victor Czech
Gregor Kobsik
Yanjiang He
I. Lim
Leif Kobbelt
39
1
0
16 Jul 2024
Latent Space Imaging
Matheus Souza
Yidan Zheng
Kaizhang Kang
Yogeshwar Nath Mishra
Qiang Fu
Wolfgang Heidrich
48
0
0
09 Jul 2024
Balance of Number of Embedding and their Dimensions in Vector Quantization
Hang Chen
Sankepally Sainath Reddy
Ziwei Chen
Dianbo Liu
32
1
0
06 Jul 2024
Codec-ASR: Training Performant Automatic Speech Recognition Systems with Discrete Speech Representations
Kunal Dhawan
Nithin Rao Koluguri
Ante Jukić
Ryan Langman
Jagadeesh Balam
Boris Ginsburg
39
1
0
03 Jul 2024
OmniJARVIS: Unified Vision-Language-Action Tokenization Enables Open-World Instruction Following Agents
Zihao Wang
Shaofei Cai
Zhancun Mu
Haowei Lin
Ceyao Zhang
Xuejie Liu
Qing Li
Anji Liu
Xiaojian Ma
Yitao Liang
LM&Ro
30
11
0
27 Jun 2024
Improving Robustness of LLM-based Speech Synthesis by Learning Monotonic Alignment
Paarth Neekhara
Shehzeen Samarah Hussain
Subhankar Ghosh
Jason Chun Lok Li
Rafael Valle
Rohan Badlani
Boris Ginsburg
50
11
0
25 Jun 2024
Autoregressive Image Generation without Vector Quantization
Tianhong Li
Yonglong Tian
He Li
Mingyang Deng
Kaiming He
DiffM
43
171
0
17 Jun 2024
ToneUnit: A Speech Discretization Approach for Tonal Language Speech Synthesis
Dehua Tao
Daxin Tan
Y. Yeung
Xiao Chen
Tan Lee
30
3
0
13 Jun 2024
An Image is Worth 32 Tokens for Reconstruction and Generation
Qihang Yu
Mark Weber
XueQing Deng
Xiaohui Shen
Daniel Cremers
Liang-Chieh Chen
VLM
ViT
44
79
0
11 Jun 2024
Image and Video Tokenization with Binary Spherical Quantization
Yue Zhao
Yuanjun Xiong
Philipp Krahenbuhl
23
17
0
11 Jun 2024
Deep Generative Modeling Reshapes Compression and Transmission: From Efficiency to Resiliency
Jincheng Dai
Xiaoqi Qin
Sixian Wang
Lexi Xu
Kai Niu
Ping Zhang
29
4
0
10 Jun 2024
Autoregressive Diffusion Transformer for Text-to-Speech Synthesis
Zhijun Liu
Shuai Wang
Sho Inoue
Qibing Bai
Haizhou Li
DiffM
37
15
0
08 Jun 2024
Addressing Index Collapse of Large-Codebook Speech Tokenizer with Dual-Decoding Product-Quantized Variational Auto-Encoder
Haohan Guo
Fenglong Xie
Dongchao Yang
Hui Lu
Xixin Wu
Helen Meng
48
6
0
05 Jun 2024
iQRL -- Implicitly Quantized Representations for Sample-efficient Reinforcement Learning
Aidan Scannell
Kalle Kujanpää
Yi Zhao
Mohammadreza Nakhaei
Arno Solin
J. Pajarinen
SSL
34
5
0
04 Jun 2024
Previous
1
2
3
Next