ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2110.04627
  4. Cited By
Vector-quantized Image Modeling with Improved VQGAN

Vector-quantized Image Modeling with Improved VQGAN

9 October 2021
Jiahui Yu
Xin Li
Jing Yu Koh
Han Zhang
Ruoming Pang
James Qin
Alexander Ku
Yuanzhong Xu
Jason Baldridge
Yonghui Wu
    ViT
    VLM
    DRL
ArXivPDFHTML

Papers citing "Vector-quantized Image Modeling with Improved VQGAN"

50 / 372 papers shown
Title
HQ-VAE: Hierarchical Discrete Representation Learning with Variational
  Bayes
HQ-VAE: Hierarchical Discrete Representation Learning with Variational Bayes
Yuhta Takida
Yukara Ikemiya
Takashi Shibuya
Kazuki Shimada
Woosung Choi
...
Naoki Murata
Toshimitsu Uesaka
Kengo Uchida
Wei-Hsiang Liao
Yuki Mitsufuji
BDL
30
11
0
31 Dec 2023
Unified-IO 2: Scaling Autoregressive Multimodal Models with Vision,
  Language, Audio, and Action
Unified-IO 2: Scaling Autoregressive Multimodal Models with Vision, Language, Audio, and Action
Jiasen Lu
Christopher Clark
Sangho Lee
Zichen Zhang
Savya Khosla
Ryan Marten
Derek Hoiem
Aniruddha Kembhavi
VLM
MLLM
27
144
0
28 Dec 2023
MaskINT: Video Editing via Interpolative Non-autoregressive Masked
  Transformers
MaskINT: Video Editing via Interpolative Non-autoregressive Masked Transformers
Haoyu Ma
Shahin Mahdizadehaghdam
Bichen Wu
Zhipeng Fan
Yuchao Gu
Wenliang Zhao
Lior Shapira
Xiaohui Xie
DiffM
VGen
12
4
0
19 Dec 2023
Topic-VQ-VAE: Leveraging Latent Codebooks for Flexible Topic-Guided
  Document Generation
Topic-VQ-VAE: Leveraging Latent Codebooks for Flexible Topic-Guided Document Generation
YoungJoon Yoo
Jongwon Choi
BDL
11
2
0
15 Dec 2023
SeiT++: Masked Token Modeling Improves Storage-efficient Training
SeiT++: Masked Token Modeling Improves Storage-efficient Training
Min-Seob Lee
Song Park
Byeongho Heo
Dongyoon Han
Hyunjung Shim
MQ
VLM
13
1
0
15 Dec 2023
Photorealistic Video Generation with Diffusion Models
Photorealistic Video Generation with Diffusion Models
Agrim Gupta
Lijun Yu
Kihyuk Sohn
Xiuye Gu
Meera Hahn
Fei-Fei Li
Irfan Essa
Lu Jiang
José Lezama
VGen
39
174
0
11 Dec 2023
4M: Massively Multimodal Masked Modeling
4M: Massively Multimodal Masked Modeling
David Mizrahi
Roman Bachmann
Ouguzhan Fatih Kar
Teresa Yeo
Mingfei Gao
Afshin Dehghan
Amir Zamir
MLLM
39
62
0
11 Dec 2023
MMM: Generative Masked Motion Model
MMM: Generative Masked Motion Model
Ekkasit Pinyoanuntapong
Pu Wang
Minwoo Lee
C. L. P. Chen
DiffM
VGen
27
43
0
06 Dec 2023
MagicStick: Controllable Video Editing via Control Handle
  Transformations
MagicStick: Controllable Video Editing via Control Handle Transformations
Yue Ma
Xiaodong Cun
Yin-Yin He
Chenyang Qi
Xintao Wang
Ying Shan
Xiu Li
Qifeng Chen
VGen
14
24
0
05 Dec 2023
GIVT: Generative Infinite-Vocabulary Transformers
GIVT: Generative Infinite-Vocabulary Transformers
Michael Tschannen
Cian Eastwood
Fabian Mentzer
10
34
0
04 Dec 2023
IMProv: Inpainting-based Multimodal Prompting for Computer Vision Tasks
IMProv: Inpainting-based Multimodal Prompting for Computer Vision Tasks
Jiarui Xu
Yossi Gandelsman
Amir Bar
Jianwei Yang
Jianfeng Gao
Trevor Darrell
Xiaolong Wang
VLM
21
3
0
04 Dec 2023
Improve Supervised Representation Learning with Masked Image Modeling
Improve Supervised Representation Learning with Masked Image Modeling
Kaifeng Chen
Daniel M. Salz
Huiwen Chang
Kihyuk Sohn
Dilip Krishnan
Mojtaba Seyedhosseini
SSL
ViT
32
2
0
01 Dec 2023
Sequential Modeling Enables Scalable Learning for Large Vision Models
Sequential Modeling Enables Scalable Learning for Large Vision Models
Yutong Bai
Xinyang Geng
K. Mangalam
Amir Bar
Alan Yuille
Trevor Darrell
Jitendra Malik
Alexei A. Efros
MLLM
VLM
22
151
0
01 Dec 2023
Language Embedded 3D Gaussians for Open-Vocabulary Scene Understanding
Language Embedded 3D Gaussians for Open-Vocabulary Scene Understanding
Jin-Chuan Shi
Miao Wang
Hao-Bin Duan
Shao-Hua Guan
3DGS
25
84
0
30 Nov 2023
Do text-free diffusion models learn discriminative visual
  representations?
Do text-free diffusion models learn discriminative visual representations?
Soumik Mukhopadhyay
M. Gwilliam
Yosuke Yamaguchi
Vatsal Agarwal
Namitha Padmanabhan
Archana Swaminathan
Tianyi Zhou
Abhinav Shrivastava
DiffM
22
11
1
29 Nov 2023
Ranni: Taming Text-to-Image Diffusion for Accurate Instruction Following
Ranni: Taming Text-to-Image Diffusion for Accurate Instruction Following
Yutong Feng
Biao Gong
Di Chen
Yujun Shen
Yu Liu
Jingren Zhou
DiffM
21
43
0
28 Nov 2023
Pre-trained Language Models Do Not Help Auto-regressive Text-to-Image
  Generation
Pre-trained Language Models Do Not Help Auto-regressive Text-to-Image Generation
Yuhui Zhang
Brandon McKinzie
Zhe Gan
Vaishaal Shankar
Alexander Toshev
23
3
0
27 Nov 2023
EucliDreamer: Fast and High-Quality Texturing for 3D Models with Stable
  Diffusion Depth
EucliDreamer: Fast and High-Quality Texturing for 3D Models with Stable Diffusion Depth
Cindy X. Le
Congrui Hetang
Chendi Lin
Ang Cao
Yihui He
36
7
0
27 Nov 2023
Self-Supervised Music Source Separation Using Vector-Quantized Source
  Category Estimates
Self-Supervised Music Source Separation Using Vector-Quantized Source Category Estimates
Marco Pasini
Stefan Lattner
George Fazekas
27
1
0
21 Nov 2023
Explainable Time Series Anomaly Detection using Masked Latent Generative
  Modeling
Explainable Time Series Anomaly Detection using Masked Latent Generative Modeling
Daesoo Lee
Sara Malacarne
Erlend Aune
AI4TS
29
9
0
21 Nov 2023
TEAL: Tokenize and Embed ALL for Multi-modal Large Language Models
TEAL: Tokenize and Embed ALL for Multi-modal Large Language Models
Zhen Yang
Yingxue Zhang
Fandong Meng
Jie Zhou
VLM
MLLM
37
3
0
08 Nov 2023
RT-Trajectory: Robotic Task Generalization via Hindsight Trajectory
  Sketches
RT-Trajectory: Robotic Task Generalization via Hindsight Trajectory Sketches
Jiayuan Gu
Sean Kirmani
Paul Wohlhart
Yao Lu
Montse Gonzalez Arenas
...
Hao Su
Karol Hausman
Chelsea Finn
Q. Vuong
Ted Xiao
28
62
0
03 Nov 2023
Codebook Features: Sparse and Discrete Interpretability for Neural
  Networks
Codebook Features: Sparse and Discrete Interpretability for Neural Networks
Alex Tamkin
Mohammad Taufeeque
Noah D. Goodman
19
27
0
26 Oct 2023
Blind Image Super-resolution with Rich Texture-Aware Codebooks
Blind Image Super-resolution with Rich Texture-Aware Codebooks
Rui Qin
Ming-hui Sun
Fangyuan Zhang
Xingsen Wen
Bin Wang
13
6
0
26 Oct 2023
VQ-NeRF: Neural Reflectance Decomposition and Editing with Vector
  Quantization
VQ-NeRF: Neural Reflectance Decomposition and Editing with Vector Quantization
Hongliang Zhong
Jingbo Zhang
Jing Liao
19
5
0
18 Oct 2023
Towards image compression with perfect realism at ultra-low bitrates
Towards image compression with perfect realism at ultra-low bitrates
Marlene Careil
Matthew Muckley
Jakob Verbeek
Stéphane Lathuilière
DiffM
21
44
0
16 Oct 2023
DrivingDiffusion: Layout-Guided multi-view driving scene video
  generation with latent diffusion model
DrivingDiffusion: Layout-Guided multi-view driving scene video generation with latent diffusion model
Xiaofan Li
Yifu Zhang
Xiaoqing Ye
VGen
65
71
0
11 Oct 2023
Language Model Beats Diffusion -- Tokenizer is Key to Visual Generation
Language Model Beats Diffusion -- Tokenizer is Key to Visual Generation
Lijun Yu
José Lezama
N. B. Gundavarapu
Luca Versari
Kihyuk Sohn
...
Boqing Gong
Ming-Hsuan Yang
Irfan Essa
David A. Ross
Lu Jiang
10
278
0
09 Oct 2023
Efficient-VQGAN: Towards High-Resolution Image Generation with Efficient
  Vision Transformers
Efficient-VQGAN: Towards High-Resolution Image Generation with Efficient Vision Transformers
Shiyue Cao
Yueqin Yin
Lianghua Huang
Yu Liu
Xin Zhao
Deli Zhao
Kaiqi Huang
ViT
10
14
0
09 Oct 2023
Leveraging Unpaired Data for Vision-Language Generative Models via Cycle
  Consistency
Leveraging Unpaired Data for Vision-Language Generative Models via Cycle Consistency
Tianhong Li
Sangnie Bhardwaj
Yonglong Tian
Han Zhang
Jarred Barber
Dina Katabi
Guillaume Lajoie
Huiwen Chang
Dilip Krishnan
VLM
36
4
0
05 Oct 2023
Generating 3D Brain Tumor Regions in MRI using Vector-Quantization
  Generative Adversarial Networks
Generating 3D Brain Tumor Regions in MRI using Vector-Quantization Generative Adversarial Networks
Meng Zhou
Matthias W. Wagner
U. Tabori
C. Hawkins
B. Ertl-Wagner
Farzad Khalvati
MedIm
19
5
0
02 Oct 2023
GAIA-1: A Generative World Model for Autonomous Driving
GAIA-1: A Generative World Model for Autonomous Driving
Masane Fuchi
Lloyd Russell
Hudson Yeo
Zak Murez
Hiroto Minami
Alex Kendall
Tomohiro Takagi
Gianluca Corrado
VGen
22
215
0
29 Sep 2023
Finite Scalar Quantization: VQ-VAE Made Simple
Finite Scalar Quantization: VQ-VAE Made Simple
Fabian Mentzer
David C. Minnen
E. Agustsson
Michael Tschannen
28
150
0
27 Sep 2023
Towards Practical and Efficient Image-to-Speech Captioning with
  Vision-Language Pre-training and Multi-modal Tokens
Towards Practical and Efficient Image-to-Speech Captioning with Vision-Language Pre-training and Multi-modal Tokens
Minsu Kim
J. Choi
Soumi Maiti
Jeong Hun Yeo
Shinji Watanabe
Y. Ro
VLM
16
6
0
15 Sep 2023
Unified Language-Vision Pretraining in LLM with Dynamic Discrete Visual
  Tokenization
Unified Language-Vision Pretraining in LLM with Dynamic Discrete Visual Tokenization
Yang Jin
Kun Xu
Kun Xu
Liwei Chen
Chao Liao
...
Xiaoqiang Lei
Di Zhang
Wenwu Ou
Kun Gai
Yadong Mu
MLLM
VLM
14
41
0
09 Sep 2023
Neural Vector Fields: Generalizing Distance Vector Fields by Codebooks
  and Zero-Curl Regularization
Neural Vector Fields: Generalizing Distance Vector Fields by Codebooks and Zero-Curl Regularization
Xianghui Yang
Guosheng Lin
Zhenghao Chen
Luping Zhou
29
2
0
04 Sep 2023
RenAIssance: A Survey into AI Text-to-Image Generation in the Era of
  Large Model
RenAIssance: A Survey into AI Text-to-Image Generation in the Era of Large Model
Fengxiang Bie
Yibo Yang
Zhongzhu Zhou
Adam Ghanem
Minjia Zhang
...
Pareesa Ameneh Golnari
David A. Clifton
Yuxiong He
Dacheng Tao
S. Song
EGVM
25
17
0
02 Sep 2023
Pose-Free Neural Radiance Fields via Implicit Pose Regularization
Pose-Free Neural Radiance Fields via Implicit Pose Regularization
Jiahui Zhang
Fangneng Zhan
Yingchen Yu
Kunhao Liu
Rongliang Wu
Xiaoqin Zhang
Ling Shao
Shijian Lu
19
9
0
29 Aug 2023
Painter: Teaching Auto-regressive Language Models to Draw Sketches
Painter: Teaching Auto-regressive Language Models to Draw Sketches
Reza Pourreza
Apratim Bhattacharyya
Sunny Panchal
Mingu Lee
Pulkit Madan
Roland Memisevic
24
5
0
16 Aug 2023
Controlling Character Motions without Observable Driving Source
Controlling Character Motions without Observable Driving Source
Weiyuan Li
Bin Dai
Ziyi Zhou
Qi Yao
Baoyuan Wang
VGen
6
1
0
11 Aug 2023
Degeneration-Tuning: Using Scrambled Grid shield Unwanted Concepts from
  Stable Diffusion
Degeneration-Tuning: Using Scrambled Grid shield Unwanted Concepts from Stable Diffusion
Zixuan Ni
Longhui Wei
Jiacheng Li
Siliang Tang
Yueting Zhuang
Qi Tian
DiffM
23
21
0
02 Aug 2023
BARTPhoBEiT: Pre-trained Sequence-to-Sequence and Image Transformers
  Models for Vietnamese Visual Question Answering
BARTPhoBEiT: Pre-trained Sequence-to-Sequence and Image Transformers Models for Vietnamese Visual Question Answering
Khiem Vinh Tran
Kiet Van Nguyen
N. Nguyen
ViT
17
2
0
28 Jul 2023
Online Clustered Codebook
Online Clustered Codebook
Chuanxia Zheng
Andrea Vedaldi
37
26
0
27 Jul 2023
Incrementally-Computable Neural Networks: Efficient Inference for
  Dynamic Inputs
Incrementally-Computable Neural Networks: Efficient Inference for Dynamic Inputs
Or Sharir
Anima Anandkumar
22
0
0
27 Jul 2023
GaitMorph: Transforming Gait by Optimally Transporting Discrete Codes
GaitMorph: Transforming Gait by Optimally Transporting Discrete Codes
Adrian Cosma
I. Radoi
51
3
0
27 Jul 2023
Grounded Object Centric Learning
Grounded Object Centric Learning
Avinash Kori
Francesco Locatello
Fabio De Sousa Ribeiro
Francesca Toni
Ben Glocker
OCL
17
7
0
18 Jul 2023
Complexity Matters: Rethinking the Latent Space for Generative Modeling
Complexity Matters: Rethinking the Latent Space for Generative Modeling
Tianyang Hu
Fei Chen
Hong Wang
Jiawei Li
Wenjia Wang
Jiacheng Sun
Z. Li
DiffM
22
8
0
17 Jul 2023
DreamTeacher: Pretraining Image Backbones with Deep Generative Models
DreamTeacher: Pretraining Image Backbones with Deep Generative Models
Daiqing Li
Huan Ling
Amlan Kar
David Acuna
Seung Wook Kim
Karsten Kreis
Antonio Torralba
Sanja Fidler
VLM
DiffM
15
26
0
14 Jul 2023
ChatGPT in the Age of Generative AI and Large Language Models: A Concise Survey
S. Mohamadi
G. Mujtaba
Ngan Le
Gianfranco Doretto
Don Adjeroh
LM&MA
AI4MH
21
21
0
09 Jul 2023
SVDM: Single-View Diffusion Model for Pseudo-Stereo 3D Object Detection
SVDM: Single-View Diffusion Model for Pseudo-Stereo 3D Object Detection
Yuguang Shi
DiffM
25
0
0
05 Jul 2023
Previous
12345678
Next