ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2110.04627
  4. Cited By
Vector-quantized Image Modeling with Improved VQGAN

Vector-quantized Image Modeling with Improved VQGAN

9 October 2021
Jiahui Yu
Xin Li
Jing Yu Koh
Han Zhang
Ruoming Pang
James Qin
Alexander Ku
Yuanzhong Xu
Jason Baldridge
Yonghui Wu
    ViT
    VLM
    DRL
ArXivPDFHTML

Papers citing "Vector-quantized Image Modeling with Improved VQGAN"

50 / 372 papers shown
Title
CoBIT: A Contrastive Bi-directional Image-Text Generation Model
CoBIT: A Contrastive Bi-directional Image-Text Generation Model
Haoxuan You
Mandy Guo
Zhecan Wang
Kai-Wei Chang
Jason Baldridge
Jiahui Yu
DiffM
37
12
0
23 Mar 2023
LD-ZNet: A Latent Diffusion Approach for Text-Based Image Segmentation
LD-ZNet: A Latent Diffusion Approach for Text-Based Image Segmentation
K. Pnvr
Bharat Singh
P. Ghosh
Behjat Siddiquie
David Jacobs
DiffM
22
29
0
22 Mar 2023
SeiT: Storage-Efficient Vision Training with Tokens Using 1% of Pixel
  Storage
SeiT: Storage-Efficient Vision Training with Tokens Using 1% of Pixel Storage
Song Park
Sanghyuk Chun
Byeongho Heo
Wonjae Kim
Sangdoo Yun
VLM
ViT
12
8
0
20 Mar 2023
IRGen: Generative Modeling for Image Retrieval
IRGen: Generative Modeling for Image Retrieval
Yidan Zhang
Ting Zhang
Dong Chen
Yujing Wang
Qi Chen
...
Qi Zhang
Fan Yang
Mao Yang
Q. Liao
B. Guo
3DV
VLM
33
14
0
17 Mar 2023
GlueGen: Plug and Play Multi-modal Encoders for X-to-image Generation
GlueGen: Plug and Play Multi-modal Encoders for X-to-image Generation
Can Qin
Ning Yu
Chen Xing
Shu Zhen Zhang
Zeyuan Chen
Stefano Ermon
Yun Fu
Caiming Xiong
Ran Xu
DiffM
30
19
0
17 Mar 2023
Regularized Vector Quantization for Tokenized Image Synthesis
Regularized Vector Quantization for Tokenized Image Synthesis
Jiahui Zhang
Fangneng Zhan
Christian Theobalt
Shijian Lu
DiffM
MQ
33
30
0
11 Mar 2023
Vector Quantized Time Series Generation with a Bidirectional Prior Model
Vector Quantized Time Series Generation with a Bidirectional Prior Model
Daesoo Lee
Sara Malacarne
Erlend Aune
BDL
32
25
0
08 Mar 2023
Neural Vector Fields: Implicit Representation by Explicit Learning
Neural Vector Fields: Implicit Representation by Explicit Learning
Xianghui Yang
Guosheng Lin
Zhenghao Chen
Luping Zhou
AI4CE
44
17
0
08 Mar 2023
FoundationTTS: Text-to-Speech for ASR Customization with Generative
  Language Model
FoundationTTS: Text-to-Speech for ASR Customization with Generative Language Model
Rui Xue
Yanqing Liu
Lei He
Xuejiao Tan
Linquan Liu
Ed Lin
Sheng Zhao
23
7
0
06 Mar 2023
StraIT: Non-autoregressive Generation with Stratified Image Transformer
StraIT: Non-autoregressive Generation with Stratified Image Transformer
Shengju Qian
Huiwen Chang
Yuanzhen Li
Zizhao Zhang
Jiaya Jia
Han Zhang
31
10
0
01 Mar 2023
Composer: Creative and Controllable Image Synthesis with Composable
  Conditions
Composer: Creative and Controllable Image Synthesis with Composable Conditions
Lianghua Huang
Di Chen
Yu Liu
Yujun Shen
Deli Zhao
Jingren Zhou
DiffM
20
278
0
20 Feb 2023
Self-Organising Neural Discrete Representation Learning à la Kohonen
Self-Organising Neural Discrete Representation Learning à la Kohonen
Kazuki Irie
Róbert Csordás
Jürgen Schmidhuber
SSL
19
1
0
15 Feb 2023
From paintbrush to pixel: A review of deep neural networks in
  AI-generated art
From paintbrush to pixel: A review of deep neural networks in AI-generated art
Anne-Sofie Maerten
Derya Soydaner
30
22
0
14 Feb 2023
VQ3D: Learning a 3D-Aware Generative Model on ImageNet
VQ3D: Learning a 3D-Aware Generative Model on ImageNet
Kyle Sargent
Jing Yu Koh
Han Zhang
Huiwen Chang
Charles Herrmann
Pratul P. Srinivasan
Jiajun Wu
Deqing Sun
24
31
0
14 Feb 2023
Multi-modal Machine Learning in Engineering Design: A Review and Future
  Directions
Multi-modal Machine Learning in Engineering Design: A Review and Future Directions
Binyang Song
Ruilin Zhou
Faez Ahmed
AI4CE
32
40
0
14 Feb 2023
Vector Quantized Wasserstein Auto-Encoder
Vector Quantized Wasserstein Auto-Encoder
Tung-Long Vuong
Trung Le
He Zhao
Chuanxia Zheng
Mehrtash Harandi
Jianfei Cai
Dinh Q. Phung
DRL
35
17
0
12 Feb 2023
Language Quantized AutoEncoders: Towards Unsupervised Text-Image
  Alignment
Language Quantized AutoEncoders: Towards Unsupervised Text-Image Alignment
Hao Liu
Wilson Yan
Pieter Abbeel
26
24
0
02 Feb 2023
Grounding Language Models to Images for Multimodal Inputs and Outputs
Grounding Language Models to Images for Multimodal Inputs and Outputs
Jing Yu Koh
Ruslan Salakhutdinov
Daniel Fried
MLLM
23
117
0
31 Jan 2023
Latent Autoregressive Source Separation
Latent Autoregressive Source Separation
Emilian Postolache
Giorgio Mariani
Michele Mancusi
Andrea Santilli
Luca Cosmo
Emanuele Rodolà
BDL
DRL
10
8
0
09 Jan 2023
Muse: Text-To-Image Generation via Masked Generative Transformers
Muse: Text-To-Image Generation via Masked Generative Transformers
Huiwen Chang
Han Zhang
Jarred Barber
AJ Maschinot
José Lezama
...
Kevin Patrick Murphy
William T. Freeman
Michael Rubinstein
Yuanzhen Li
Dilip Krishnan
DiffM
197
517
0
02 Jan 2023
Tune-A-Video: One-Shot Tuning of Image Diffusion Models for
  Text-to-Video Generation
Tune-A-Video: One-Shot Tuning of Image Diffusion Models for Text-to-Video Generation
Jay Zhangjie Wu
Yixiao Ge
Xintao Wang
Weixian Lei
Yuchao Gu
Yufei Shi
W. Hsu
Ying Shan
Xiaohu Qie
Mike Zheng Shou
VGen
21
690
0
22 Dec 2022
Towards Neural Variational Monte Carlo That Scales Linearly with System
  Size
Towards Neural Variational Monte Carlo That Scales Linearly with System Size
Or Sharir
G. Chan
Anima Anandkumar
6
4
0
21 Dec 2022
QuantArt: Quantizing Image Style Transfer Towards High Visual Fidelity
QuantArt: Quantizing Image Style Transfer Towards High Visual Fidelity
Siyu Huang
Jie An
D. Wei
Jiebo Luo
Hanspeter Pfister
DiffM
19
28
0
20 Dec 2022
LayoutDETR: Detection Transformer Is a Good Multimodal Layout Designer
LayoutDETR: Detection Transformer Is a Good Multimodal Layout Designer
Ning Yu
Chia-Chih Chen
Zeyuan Chen
Rui Meng
Ganglu Wu
P. Josel
Juan Carlos Niebles
Caiming Xiong
Ran Xu
ViT
DiffM
19
6
0
19 Dec 2022
BEATs: Audio Pre-Training with Acoustic Tokenizers
BEATs: Audio Pre-Training with Acoustic Tokenizers
Sanyuan Chen
Yu-Huan Wu
Chengyi Wang
Shujie Liu
Daniel C. Tompkins
Zhuo Chen
Furu Wei
30
253
0
18 Dec 2022
Image Compression with Product Quantized Masked Image Modeling
Image Compression with Product Quantized Masked Image Modeling
Alaaeldin El-Nouby
Matthew Muckley
Karen Ullrich
Ivan Laptev
Jakob Verbeek
Hervé Jégou
MQ
19
31
0
14 Dec 2022
MAGVIT: Masked Generative Video Transformer
MAGVIT: Masked Generative Video Transformer
Lijun Yu
Yong Cheng
Kihyuk Sohn
José Lezama
Han Zhang
...
Alexander G. Hauptmann
Ming-Hsuan Yang
Yuan Hao
Irfan Essa
Lu Jiang
DiffM
VGen
22
223
0
10 Dec 2022
Rethinking the Objectives of Vector-Quantized Tokenizers for Image
  Synthesis
Rethinking the Objectives of Vector-Quantized Tokenizers for Image Synthesis
Yuchao Gu
Xintao Wang
Yixiao Ge
Ying Shan
Xiaohu Qie
Mike Zheng Shou
DiffM
24
20
0
06 Dec 2022
M-VADER: A Model for Diffusion with Multimodal Context
M-VADER: A Model for Diffusion with Multimodal Context
Samuel Weinbach
Marco Bellagente
C. Eichenberg
Andrew M. Dai
R. Baldock
Souradeep Nanda
Bjorn Deiseroth
Koen Oostermeijer
H. Teufel
Andres Felipe Cruz Salinas
DiffM
27
11
0
06 Dec 2022
Unified Discrete Diffusion for Simultaneous Vision-Language Generation
Unified Discrete Diffusion for Simultaneous Vision-Language Generation
Minghui Hu
Chuanxia Zheng
Heliang Zheng
Tat-Jen Cham
Chaoyue Wang
Zuopeng Yang
Dacheng Tao
Ponnuthurai Nagaratnam Suganthan
DiffM
18
23
0
27 Nov 2022
MAGE: MAsked Generative Encoder to Unify Representation Learning and
  Image Synthesis
MAGE: MAsked Generative Encoder to Unify Representation Learning and Image Synthesis
Tianhong Li
Huiwen Chang
Shlok Kumar Mishra
Han Zhang
Dina Katabi
Dilip Krishnan
19
151
0
16 Nov 2022
Large-Scale Bidirectional Training for Zero-Shot Image Captioning
Large-Scale Bidirectional Training for Zero-Shot Image Captioning
Taehoon Kim
Mark A Marsden
Pyunghwan Ahn
Sangyun Kim
Sihaeng Lee
Alessandra Sala
S. Kim
VLM
27
4
0
13 Nov 2022
Phenaki: Variable Length Video Generation From Open Domain Textual
  Description
Phenaki: Variable Length Video Generation From Open Domain Textual Description
Ruben Villegas
Mohammad Babaeizadeh
Pieter-Jan Kindermans
Hernan Moraldo
Han Zhang
M. Saffar
Santiago Castro
Julius Kunze
D. Erhan
DiffM
VGen
25
370
0
05 Oct 2022
Temporally Consistent Transformers for Video Generation
Temporally Consistent Transformers for Video Generation
Wilson Yan
Danijar Hafner
Stephen James
Pieter Abbeel
DiffM
19
27
0
05 Oct 2022
Progressive Text-to-Image Generation
Progressive Text-to-Image Generation
Zhengcong Fei
Mingyuan Fan
Li Zhu
Junshi Huang
73
4
0
05 Oct 2022
Visual Prompt Tuning for Generative Transfer Learning
Visual Prompt Tuning for Generative Transfer Learning
Kihyuk Sohn
Yuan Hao
José Lezama
Luisa F. Polanía
Huiwen Chang
Han Zhang
Irfan Essa
Lu Jiang
VPVLM
VLM
51
81
0
03 Oct 2022
Make-A-Video: Text-to-Video Generation without Text-Video Data
Make-A-Video: Text-to-Video Generation without Text-Video Data
Uriel Singer
Adam Polyak
Thomas Hayes
Xiaoyue Yin
Jie An
...
Oron Ashual
Oran Gafni
Devi Parikh
Sonal Gupta
Yaniv Taigman
DiffM
VGen
22
1,345
0
29 Sep 2022
MoVQ: Modulating Quantized Vectors for High-Fidelity Image Generation
MoVQ: Modulating Quantized Vectors for High-Fidelity Image Generation
Chuanxia Zheng
L. Vuong
Jianfei Cai
Dinh Q. Phung
MQ
58
72
0
19 Sep 2022
Improved Masked Image Generation with Token-Critic
Improved Masked Image Generation with Token-Critic
José Lezama
Huiwen Chang
Lu Jiang
Irfan Essa
DiffM
180
43
0
09 Sep 2022
Morphology-preserving Autoregressive 3D Generative Modelling of the
  Brain
Morphology-preserving Autoregressive 3D Generative Modelling of the Brain
Petru-Daniel Tudosiu
W. H. Pinaya
M. Graham
Pedro Borges
Virginia Fernandez
...
Disha Mehra
M. Vella
P. Nachev
Sebastien Ourselin
M. Jorge Cardoso
3DH
DiffM
MedIm
14
19
0
07 Sep 2022
AudioLM: a Language Modeling Approach to Audio Generation
AudioLM: a Language Modeling Approach to Audio Generation
Zalan Borsos
Raphaël Marinier
Damien Vincent
Eugene Kharitonov
Olivier Pietquin
...
Dominik Roblek
O. Teboul
David Grangier
Marco Tagliasacchi
Neil Zeghidour
AuLLM
19
565
0
07 Sep 2022
Visual Prompting via Image Inpainting
Visual Prompting via Image Inpainting
Amir Bar
Yossi Gandelsman
Trevor Darrell
Amir Globerson
Alexei A. Efros
VLM
VPVLM
14
200
0
01 Sep 2022
Frido: Feature Pyramid Diffusion for Complex Scene Image Synthesis
Frido: Feature Pyramid Diffusion for Complex Scene Image Synthesis
Wanshu Fan
Yen-Chun Chen
Dongdong Chen
Yu Cheng
Lu Yuan
Yu-Chiang Frank Wang
DiffM
15
90
0
29 Aug 2022
BEiT v2: Masked Image Modeling with Vector-Quantized Visual Tokenizers
BEiT v2: Masked Image Modeling with Vector-Quantized Visual Tokenizers
Zhiliang Peng
Li Dong
Hangbo Bao
QiXiang Ye
Furu Wei
16
303
0
12 Aug 2022
Prompt-to-Prompt Image Editing with Cross Attention Control
Prompt-to-Prompt Image Editing with Cross Attention Control
Amir Hertz
Ron Mokady
J. Tenenbaum
Kfir Aberman
Yael Pritch
Daniel Cohen-Or
DiffM
41
1,689
0
02 Aug 2022
Discrete Key-Value Bottleneck
Discrete Key-Value Bottleneck
Frederik Trauble
Anirudh Goyal
Nasim Rahaman
Michael C. Mozer
Kenji Kawaguchi
Yoshua Bengio
Bernhard Schölkopf
CLL
13
22
0
22 Jul 2022
NUWA-Infinity: Autoregressive over Autoregressive Generation for
  Infinite Visual Synthesis
NUWA-Infinity: Autoregressive over Autoregressive Generation for Infinite Visual Synthesis
Chenfei Wu
Jian Liang
Xiaowei Hu
Zhe Gan
Jianfeng Wang
Lijuan Wang
Zicheng Liu
Yuejian Fang
Nan Duan
VGen
10
72
0
20 Jul 2022
Improving Diffusion Model Efficiency Through Patching
Improving Diffusion Model Efficiency Through Patching
Troy Luhman
Eric Luhman
DiffM
9
18
0
09 Jul 2022
Megapixel Image Generation with Step-Unrolled Denoising Autoencoders
Megapixel Image Generation with Step-Unrolled Denoising Autoencoders
Alex F. McKinney
Chris G. Willcocks
DiffM
22
0
0
24 Jun 2022
Scaling Autoregressive Models for Content-Rich Text-to-Image Generation
Scaling Autoregressive Models for Content-Rich Text-to-Image Generation
Jiahui Yu
Yuanzhong Xu
Jing Yu Koh
Thang Luong
Gunjan Baid
...
Zarana Parekh
Xin Li
Han Zhang
Jason Baldridge
Yonghui Wu
EGVM
85
1,061
0
22 Jun 2022
Previous
12345678
Next