Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
All Papers
0 / 0 papers shown
Title
Home
Papers
2412.10958
Cited By
v1
v2
v3 (latest)
SoftVQ-VAE: Efficient 1-Dimensional Continuous Tokenizer
Computer Vision and Pattern Recognition (CVPR), 2024
14 December 2024
Zeyang Zhang
Zihan Wang
Xianrui Li
Xingwu Sun
Fangyi Chen
Jiang Liu
Jiadong Wang
Bhiksha Raj
Zicheng Liu
Emad Barsoum
VLM
Re-assign community
ArXiv (abs)
PDF
HTML
HuggingFace (1 upvotes)
Papers citing
"SoftVQ-VAE: Efficient 1-Dimensional Continuous Tokenizer"
50 / 116 papers shown
Title
Language Model Beats Diffusion -- Tokenizer is Key to Visual Generation
Lijun Yu
José Lezama
N. B. Gundavarapu
Luca Versari
Kihyuk Sohn
...
Boqing Gong
Ming-Hsuan Yang
Irfan Essa
David A. Ross
Lu Jiang
403
504
0
09 Oct 2023
Making LLaMA SEE and Draw with SEED Tokenizer
International Conference on Learning Representations (ICLR), 2023
Yuying Ge
Sijie Zhao
Ziyun Zeng
Yixiao Ge
Chen Li
Xintao Wang
Ying Shan
157
176
0
02 Oct 2023
Finite Scalar Quantization: VQ-VAE Made Simple
International Conference on Learning Representations (ICLR), 2023
Fabian Mentzer
David C. Minnen
E. Agustsson
Michael Tschannen
295
340
0
27 Sep 2023
Planting a SEED of Vision in Large Language Model
Yuying Ge
Yixiao Ge
Ziyun Zeng
Xintao Wang
Ying Shan
VLM
MLLM
209
122
0
16 Jul 2023
SPAE: Semantic Pyramid AutoEncoder for Multimodal Generation with Frozen LLMs
Neural Information Processing Systems (NeurIPS), 2023
Lijun Yu
Yong Cheng
Zhiruo Wang
Vivek Kumar
Wolfgang Macherey
...
Yonatan Bisk
Ming-Hsuan Yang
Kevin Patrick Murphy
Alexander G. Hauptmann
Lu Jiang
MLLM
299
67
0
30 Jun 2023
Straightening Out the Straight-Through Estimator: Overcoming Optimization Challenges in Vector Quantized Networks
International Conference on Machine Learning (ICML), 2023
Minyoung Huh
Brian Cheung
Pulkit Agrawal
Phillip Isola
MQ
122
86
0
15 May 2023
MDTv2: Masked Diffusion Transformer is a Strong Image Synthesizer
IEEE International Conference on Computer Vision (ICCV), 2023
Shanghua Gao
Pan Zhou
Mingg-Ming Cheng
Shuicheng Yan
DiffM
922
240
0
25 Mar 2023
EVA-02: A Visual Representation for Neon Genesis
Image and Vision Computing (IVC), 2023
Yuxin Fang
Quan-Sen Sun
Xinggang Wang
Tiejun Huang
Xinlong Wang
Yue Cao
VLM
ViT
CLIP
368
392
0
20 Mar 2023
Stochastic Interpolants: A Unifying Framework for Flows and Diffusions
M. S. Albergo
Nicholas M. Boffi
Eric Vanden-Eijnden
DiffM
1.0K
540
0
15 Mar 2023
Understanding Diffusion Objectives as the ELBO with Simple Data Augmentation
Neural Information Processing Systems (NeurIPS), 2023
Diederik P. Kingma
Ruiqi Gao
DiffM
620
227
0
01 Mar 2023
A Reparameterized Discrete Diffusion Model for Text Generation
Lin Zheng
Jianbo Yuan
Lei Yu
Lingpeng Kong
DiffM
261
112
0
11 Feb 2023
BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models
International Conference on Machine Learning (ICML), 2023
Junnan Li
Dongxu Li
Silvio Savarese
Steven C. H. Hoi
VLM
MLLM
1.1K
6,487
0
30 Jan 2023
Scalable Diffusion Models with Transformers
IEEE International Conference on Computer Vision (ICCV), 2022
William S. Peebles
Saining Xie
GNN
1.8K
4,070
0
19 Dec 2022
Rethinking the Objectives of Vector-Quantized Tokenizers for Image Synthesis
Computer Vision and Pattern Recognition (CVPR), 2022
Yuchao Gu
Xintao Wang
Yixiao Ge
Ying Shan
Xiaohu Qie
Mike Zheng Shou
DiffM
192
29
0
06 Dec 2022
MAGE: MAsked Generative Encoder to Unify Representation Learning and Image Synthesis
Computer Vision and Pattern Recognition (CVPR), 2022
Tianhong Li
Huiwen Chang
Shlok Kumar Mishra
Han Zhang
Dina Katabi
Dilip Krishnan
248
223
0
16 Nov 2022
DiffuSeq: Sequence to Sequence Text Generation with Diffusion Models
International Conference on Learning Representations (ICLR), 2022
Shansan Gong
Mukai Li
Jiangtao Feng
Zhiyong Wu
Lingpeng Kong
390
440
0
17 Oct 2022
Flow Matching for Generative Modeling
International Conference on Learning Representations (ICLR), 2022
Y. Lipman
Ricky T. Q. Chen
Heli Ben-Hamu
Maximilian Nickel
Matt Le
OOD
955
2,724
0
06 Oct 2022
All are Worth Words: A ViT Backbone for Diffusion Models
Computer Vision and Pattern Recognition (CVPR), 2022
Fan Bao
Shen Nie
Kaiwen Xue
Yue Cao
Chongxuan Li
Hang Su
Jun Zhu
VLM
509
490
0
25 Sep 2022
Classifier-Free Diffusion Guidance
Jonathan Ho
Tim Salimans
FaML
446
5,196
0
26 Jul 2022
Photorealistic Text-to-Image Diffusion Models with Deep Language Understanding
Neural Information Processing Systems (NeurIPS), 2022
Chitwan Saharia
William Chan
Saurabh Saxena
Lala Li
Jay Whang
...
Raphael Gontijo-Lopes
Tim Salimans
Jonathan Ho
David J Fleet
Mohammad Norouzi
VLM
1.1K
7,380
0
23 May 2022
Video Diffusion Models
Neural Information Processing Systems (NeurIPS), 2022
Jonathan Ho
Tim Salimans
Alexey A. Gritsenko
William Chan
Mohammad Norouzi
David J. Fleet
DiffM
VGen
782
2,171
0
07 Apr 2022
Make-A-Scene: Scene-Based Text-to-Image Generation with Human Priors
European Conference on Computer Vision (ECCV), 2022
Oran Gafni
Adam Polyak
Oron Ashual
Shelly Sheynin
Devi Parikh
Yaniv Taigman
DiffM
254
595
0
24 Mar 2022
Autoregressive Image Generation using Residual Quantization
Computer Vision and Pattern Recognition (CVPR), 2022
Doyup Lee
Chiheon Kim
Saehoon Kim
Minsu Cho
Wook-Shin Han
VGen
1.0K
564
0
03 Mar 2022
Generative Adversarial Networks
International Conference on Computing Communication and Networking Technologies (ICCCNT), 2021
Gilad Cohen
Raja Giryes
GAN
781
30,299
0
01 Mar 2022
High-Resolution Image Synthesis with Latent Diffusion Models
Computer Vision and Pattern Recognition (CVPR), 2021
Robin Rombach
A. Blattmann
Dominik Lorenz
Patrick Esser
Bjorn Ommer
DiffM
1.8K
20,624
0
20 Dec 2021
GLIDE: Towards Photorealistic Image Generation and Editing with Text-Guided Diffusion Models
International Conference on Machine Learning (ICML), 2021
Alex Nichol
Prafulla Dhariwal
Aditya A. Ramesh
Pranav Shyam
Pamela Mishkin
Bob McGrew
Ilya Sutskever
Mark Chen
952
4,319
0
20 Dec 2021
PeCo: Perceptual Codebook for BERT Pre-training of Vision Transformers
Xiaoyi Dong
Jianmin Bao
Ting Zhang
Dongdong Chen
Weiming Zhang
Lu Yuan
Dong Chen
Fang Wen
Nenghai Yu
Baining Guo
ViT
332
271
0
24 Nov 2021
Masked Autoencoders Are Scalable Vision Learners
Computer Vision and Pattern Recognition (CVPR), 2021
Kaiming He
Xinlei Chen
Saining Xie
Yanghao Li
Piotr Dollár
Ross B. Girshick
ViT
TPM
1.8K
9,880
0
11 Nov 2021
Vector-quantized Image Modeling with Improved VQGAN
International Conference on Learning Representations (ICLR), 2021
Jiahui Yu
Xin Li
Jing Yu Koh
Han Zhang
Ruoming Pang
James Qin
Alexander Ku
Yuanzhong Xu
Jason Baldridge
Yonghui Wu
ViT
VLM
DRL
447
665
0
09 Oct 2021
SoundStream: An End-to-End Neural Audio Codec
Neil Zeghidour
Alejandro Luebs
Ahmed Omran
Jan Skoglund
Marco Tagliasacchi
AI4TS
399
1,088
0
07 Jul 2021
BEiT: BERT Pre-Training of Image Transformers
Hangbo Bao
Li Dong
Songhao Piao
Furu Wei
ViT
788
3,355
0
15 Jun 2021
Score-based Generative Modeling in Latent Space
Neural Information Processing Systems (NeurIPS), 2021
Arash Vahdat
Karsten Kreis
Jan Kautz
DiffM
394
797
0
10 Jun 2021
CogView: Mastering Text-to-Image Generation via Transformers
Neural Information Processing Systems (NeurIPS), 2021
Ming Ding
Zhuoyi Yang
Wenyi Hong
Wendi Zheng
Chang Zhou
...
Junyang Lin
Xu Zou
Zhou Shao
Hongxia Yang
Jie Tang
ViT
VLM
357
912
0
26 May 2021
Grad-TTS: A Diffusion Probabilistic Model for Text-to-Speech
International Conference on Machine Learning (ICML), 2021
Vadim Popov
Ivan Vovk
Vladimir Gogoryan
Tasnima Sadekova
Mikhail Kudinov
DiffM
327
651
0
13 May 2021
Emerging Properties in Self-Supervised Vision Transformers
IEEE International Conference on Computer Vision (ICCV), 2021
Mathilde Caron
Hugo Touvron
Ishan Misra
Edouard Grave
Julien Mairal
Piotr Bojanowski
Armand Joulin
1.9K
7,754
0
29 Apr 2021
RoFormer: Enhanced Transformer with Rotary Position Embedding
Jianlin Su
Yu Lu
Shengfeng Pan
Ahmed Murtadha
Bo Wen
Yunfeng Liu
813
3,834
0
20 Apr 2021
Regularizing Generative Adversarial Networks under Limited Data
Computer Vision and Pattern Recognition (CVPR), 2021
Hung-Yu Tseng
Lu Jiang
Ce Liu
Ming-Hsuan Yang
Weilong Yang
GAN
214
159
0
07 Apr 2021
Learning Transferable Visual Models From Natural Language Supervision
International Conference on Machine Learning (ICML), 2021
Alec Radford
Jong Wook Kim
Chris Hallacy
Aditya A. Ramesh
Gabriel Goh
...
Amanda Askell
Pamela Mishkin
Jack Clark
Gretchen Krueger
Ilya Sutskever
CLIP
VLM
2.0K
40,340
0
26 Feb 2021
Zero-Shot Text-to-Image Generation
International Conference on Machine Learning (ICML), 2021
Aditya A. Ramesh
Mikhail Pavlov
Gabriel Goh
Scott Gray
Chelsea Voss
Alec Radford
Mark Chen
Ilya Sutskever
VLM
740
5,908
0
24 Feb 2021
Improved Denoising Diffusion Probabilistic Models
International Conference on Machine Learning (ICML), 2021
Alex Nichol
Prafulla Dhariwal
DiffM
640
4,638
0
18 Feb 2021
Taming Transformers for High-Resolution Image Synthesis
Computer Vision and Pattern Recognition (CVPR), 2020
Patrick Esser
Robin Rombach
Bjorn Ommer
ViT
635
3,721
0
17 Dec 2020
An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale
Alexey Dosovitskiy
Lucas Beyer
Alexander Kolesnikov
Dirk Weissenborn
Xiaohua Zhai
...
Matthias Minderer
G. Heigold
Sylvain Gelly
Jakob Uszkoreit
N. Houlsby
ViT
1.3K
54,036
0
22 Oct 2020
Deformable DETR: Deformable Transformers for End-to-End Object Detection
International Conference on Learning Representations (ICLR), 2020
Xizhou Zhu
Weijie Su
Lewei Lu
Bin Li
Xiaogang Wang
Jifeng Dai
ViT
669
6,515
0
08 Oct 2020
Denoising Diffusion Implicit Models
International Conference on Learning Representations (ICLR), 2020
Jiaming Song
Chenlin Meng
Stefano Ermon
VLM
DiffM
1.3K
10,000
0
06 Oct 2020
wav2vec 2.0: A Framework for Self-Supervised Learning of Speech Representations
Alexei Baevski
Henry Zhou
Abdel-rahman Mohamed
Michael Auli
SSL
1.6K
7,280
0
20 Jun 2020
Denoising Diffusion Probabilistic Models
Jonathan Ho
Ajay Jain
Pieter Abbeel
DiffM
4.4K
25,188
0
19 Jun 2020
Differentiable Augmentation for Data-Efficient GAN Training
Shengyu Zhao
Zhijian Liu
Ji Lin
Jun-Yan Zhu
Song Han
399
651
0
18 Jun 2020
Decision-Making with Auto-Encoding Variational Bayes
Neural Information Processing Systems (NeurIPS), 2020
Romain Lopez
Pierre Boyeau
Nir Yosef
Michael I. Jordan
Jeffrey Regier
BDL
1.5K
19,430
0
17 Feb 2020
Analyzing and Improving the Image Quality of StyleGAN
Computer Vision and Pattern Recognition (CVPR), 2019
Tero Karras
S. Laine
M. Aittala
Janne Hellsten
J. Lehtinen
Timo Aila
GAN
811
6,544
0
03 Dec 2019
Consistency Regularization for Generative Adversarial Networks
International Conference on Learning Representations (ICLR), 2019
Han Zhang
Zizhao Zhang
Augustus Odena
Honglak Lee
GAN
219
298
0
26 Oct 2019
Previous
1
2
3
Next