Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2012.09841
Cited By
Taming Transformers for High-Resolution Image Synthesis
17 December 2020
Patrick Esser
Robin Rombach
Bjorn Ommer
ViT
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Taming Transformers for High-Resolution Image Synthesis"
50 / 476 papers shown
Title
Can deep learning match the efficiency of human visual long-term memory in storing object details?
Emin Orhan
VLM
OCL
20
0
0
27 Apr 2022
Conformer and Blind Noisy Students for Improved Image Quality Assessment
Marcos V. Conde
Maxime Burchi
Radu Timofte
DiffM
25
13
0
27 Apr 2022
CTCNet: A CNN-Transformer Cooperation Network for Face Image Super-Resolution
Guangwei Gao
Zixiang Xu
Juncheng Li
Jian Yang
T. Zeng
Guo-Jun Qi
CVBM
ViT
SupR
29
80
0
19 Apr 2022
VQGAN-CLIP: Open Domain Image Generation and Editing with Natural Language Guidance
Katherine Crowson
Stella Biderman
Daniel Kornis
Dashiell Stander
Eric Hallahan
Louis Castricato
Edward Raff
CLIP
57
367
0
18 Apr 2022
Hierarchical Text-Conditional Image Generation with CLIP Latents
Aditya A. Ramesh
Prafulla Dhariwal
Alex Nichol
Casey Chu
Mark Chen
VLM
DiffM
67
6,622
0
13 Apr 2022
No Token Left Behind: Explainability-Aided Image Classification and Generation
Roni Paiss
Hila Chefer
Lior Wolf
VLM
26
29
0
11 Apr 2022
Long Video Generation with Time-Agnostic VQGAN and Time-Sensitive Transformer
Songwei Ge
Thomas Hayes
Harry Yang
Xiaoyue Yin
Guan Pang
David Jacobs
Jia-Bin Huang
Devi Parikh
ViT
33
214
0
07 Apr 2022
DT2I: Dense Text-to-Image Generation from Region Descriptions
Stanislav Frolov
Prateek Bansal
Jörn Hees
Andreas Dengel
VLM
19
5
0
05 Apr 2022
Autoregressive 3D Shape Generation via Canonical Mapping
A. Cheng
Xueting Li
Sifei Liu
Min Sun
Ming Yang
3DPC
37
39
0
05 Apr 2022
Audio-Visual Speech Codecs: Rethinking Audio-Visual Speech Enhancement by Re-Synthesis
Karren D. Yang
Dejan Marković
Steven Krenn
Vasu Agrawal
Alexander Richard
VGen
4
32
0
31 Mar 2022
VPTR: Efficient Transformers for Video Prediction
Xi Ye
Guillaume-Alexandre Bilodeau
ViT
19
18
0
29 Mar 2022
mc-BEiT: Multi-choice Discretization for Image BERT Pre-training
Xiaotong Li
Yixiao Ge
Kun Yi
Zixuan Hu
Ying Shan
Ling-yu Duan
16
38
0
29 Mar 2022
Diverse Plausible 360-Degree Image Outpainting for Efficient 3DCG Background Creation
Naofumi Akimoto
Yuhi Matsuo
Y. Aoki
41
34
0
28 Mar 2022
Give Me Your Attention: Dot-Product Attention Considered Harmful for Adversarial Patch Robustness
Giulio Lovisotto
Nicole Finnie
Mauricio Muñoz
Chaithanya Kumar Mummadi
J. H. Metzen
AAML
ViT
17
32
0
25 Mar 2022
Make-A-Scene: Scene-Based Text-to-Image Generation with Human Priors
Oran Gafni
Adam Polyak
Oron Ashual
Shelly Sheynin
Devi Parikh
Yaniv Taigman
DiffM
17
510
0
24 Mar 2022
Bailando: 3D Dance Generation by Actor-Critic GPT with Choreographic Memory
Lian Siyao
Weijiang Yu
Tianpei Gu
Chunze Lin
Quan Wang
Chao Qian
Chen Change Loy
Ziwei Liu
SLR
26
183
0
24 Mar 2022
ViewFormer: NeRF-free Neural Rendering from Few Images Using Transformers
Jonávs Kulhánek
Erik Derner
Torsten Sattler
Robert Babuvska
ViT
25
73
0
18 Mar 2022
MatchFormer: Interleaving Attention in Transformers for Feature Matching
Qing Wang
Jiaming Zhang
Kailun Yang
Kunyu Peng
Rainer Stiefelhagen
ViT
31
141
0
17 Mar 2022
DU-VLG: Unifying Vision-and-Language Generation via Dual Sequence-to-Sequence Pre-training
Luyang Huang
Guocheng Niu
Jiachen Liu
Xinyan Xiao
Hua-Hong Wu
VLM
CoGe
14
7
0
17 Mar 2022
The Role of ImageNet Classes in Fréchet Inception Distance
Tuomas Kynkaanniemi
Tero Karras
M. Aittala
Timo Aila
J. Lehtinen
EGVM
VLM
22
200
0
11 Mar 2022
FlexIT: Towards Flexible Semantic Image Translation
Guillaume Couairon
Asya Grechka
Jakob Verbeek
Holger Schwenk
Matthieu Cord
DiffM
31
35
0
09 Mar 2022
Contextformer: A Transformer with Spatio-Channel Attention for Context Modeling in Learned Image Compression
A. B. Koyuncu
Han Gao
Atanas Boev
Georgii Gaikov
Elena Alshina
Eckehard Steinbach
ViT
31
67
0
04 Mar 2022
Autoregressive Image Generation using Residual Quantization
Doyup Lee
Chiheon Kim
Saehoon Kim
Minsu Cho
Wook-Shin Han
VGen
168
325
0
03 Mar 2022
Incremental Transformer Structure Enhanced Image Inpainting with Masking Positional Encoding
Qiaole Dong
Chenjie Cao
Yanwei Fu
CLL
9
137
0
02 Mar 2022
CLIP-GEN: Language-Free Training of a Text-to-Image Generator with CLIP
Zihao W. Wang
Wei Liu
Qian He
Xin-ru Wu
Zili Yi
CLIP
VLM
182
71
0
01 Mar 2022
Real-World Blind Super-Resolution via Feature Matching with Implicit High-Resolution Priors
Chaofeng Chen
Xinyu Shi
Yipeng Qin
Xiaoming Li
Xiaoguang Han
Taojiannan Yang
Shihui Guo
17
113
0
26 Feb 2022
Diffusion bridges vector quantized Variational AutoEncoders
Max H. Cohen
Guillaume Quispe
Sylvain Le Corff
Charles Ollion
Eric Moulines
DiffM
11
13
0
10 Feb 2022
CM3: A Causal Masked Multimodal Model of the Internet
Armen Aghajanyan
Po-Yao (Bernie) Huang
Candace Ross
Vladimir Karpukhin
Hu Xu
...
Dmytro Okhonko
Mandar Joshi
Gargi Ghosh
M. Lewis
Luke Zettlemoyer
15
154
0
19 Jan 2022
RestoreFormer: High-Quality Blind Face Restoration from Undegraded Key-Value Pairs
Zhouxia Wang
Jiawei Zhang
Runjian Chen
Wenping Wang
Ping Luo
CVBM
13
109
0
17 Jan 2022
Music2Video: Automatic Generation of Music Video with fusion of audio and text
Yoonjeon Kim
Joel Jang
Sumin Shin
DiffM
VGen
13
7
0
11 Jan 2022
ERNIE-ViLG: Unified Generative Pre-training for Bidirectional Vision-Language Generation
Han Zhang
Weichong Yin
Yewei Fang
Lanxin Li
Boqiang Duan
Zhihua Wu
Yu Sun
Hao Tian
Hua-Hong Wu
Haifeng Wang
27
58
0
31 Dec 2021
High-Resolution Image Synthesis with Latent Diffusion Models
Robin Rombach
A. Blattmann
Dominik Lorenz
Patrick Esser
Bjorn Ommer
3DV
83
14,580
0
20 Dec 2021
Text2Mesh: Text-Driven Neural Stylization for Meshes
O. Michel
Roi Bar-On
Richard Liu
Sagie Benaim
Rana Hanocka
CLIP
AI4CE
182
350
0
06 Dec 2021
Global Context with Discrete Diffusion in Vector Quantised Modelling for Image Generation
Minghui Hu
Yujie Wang
Tat-Jen Cham
Jianfei Yang
P.N.Suganthan
DiffM
11
40
0
03 Dec 2021
Exploration into Translation-Equivariant Image Quantization
W. Shin
Gyubok Lee
Jiyoung Lee
Eun-Young Lyou
Joonseok Lee
E. Choi
25
7
0
01 Dec 2021
CLIPstyler: Image Style Transfer with a Single Text Condition
Gihyun Kwon
Jong Chul Ye
VLM
CLIP
11
240
0
01 Dec 2021
Diffusion Autoencoders: Toward a Meaningful and Decodable Representation
Konpat Preechakul
Nattanat Chatthee
Suttisak Wizadwongsa
Supasorn Suwajanakorn
SyDa
DiffM
27
413
0
30 Nov 2021
EdiBERT, a generative model for image editing
Thibaut Issenhuth
Ugo Tanielian
Jérémie Mary
David Picard
DiffM
16
12
0
30 Nov 2021
Vector Quantized Diffusion Model for Text-to-Image Synthesis
Shuyang Gu
Dong Chen
Jianmin Bao
Fang Wen
Bo Zhang
Dongdong Chen
Lu Yuan
B. Guo
DiffM
45
756
0
29 Nov 2021
Blended Diffusion for Text-driven Editing of Natural Images
Omri Avrahami
Dani Lischinski
Ohad Fried
DiffM
11
919
0
29 Nov 2021
SWAT: Spatial Structure Within and Among Tokens
Kumara Kahatapitiya
Michael S. Ryoo
20
6
0
26 Nov 2021
Scene Representation Transformer: Geometry-Free Novel View Synthesis Through Set-Latent Scene Representations
Mehdi S. M. Sajjadi
H. Meyer
Etienne Pot
Urs M. Bergmann
Klaus Greff
...
Daniel Duckworth
Alexey Dosovitskiy
Jakob Uszkoreit
Thomas Funkhouser
Andrea Tagliasacchi
ViT
19
184
0
25 Nov 2021
PeCo: Perceptual Codebook for BERT Pre-training of Vision Transformers
Xiaoyi Dong
Jianmin Bao
Ting Zhang
Dongdong Chen
Weiming Zhang
Lu Yuan
Dong Chen
Fang Wen
Nenghai Yu
Baining Guo
ViT
33
238
0
24 Nov 2021
NÜWA: Visual Synthesis Pre-training for Neural visUal World creAtion
Chenfei Wu
Jian Liang
Lei Ji
Fan Yang
Yuejian Fang
Daxin Jiang
Nan Duan
ViT
VGen
14
292
0
24 Nov 2021
Discrete Representations Strengthen Vision Transformer Robustness
Chengzhi Mao
Lu Jiang
Mostafa Dehghani
Carl Vondrick
Rahul Sukthankar
Irfan Essa
ViT
25
43
0
20 Nov 2021
LAION-400M: Open Dataset of CLIP-Filtered 400 Million Image-Text Pairs
Christoph Schuhmann
Richard Vencu
Romain Beaumont
R. Kaczmarczyk
Clayton Mullis
Aarush Katta
Theo Coombes
J. Jitsev
Aran Komatsuzaki
VLM
MLLM
CLIP
8
1,372
0
03 Nov 2021
Projected GANs Converge Faster
Axel Sauer
Kashyap Chitta
Jens Muller
Andreas Geiger
32
233
0
01 Nov 2021
Blending Anti-Aliasing into Vision Transformer
Shengju Qian
Hao Shao
Yi Zhu
Mu Li
Jiaya Jia
19
20
0
28 Oct 2021
Telling Creative Stories Using Generative Visual Aids
Safinah Ali
Devi Parikh
11
12
0
27 Oct 2021
Wav2CLIP: Learning Robust Audio Representations From CLIP
Ho-Hsiang Wu
Prem Seetharaman
Kundan Kumar
J. P. Bello
CLIP
VLM
16
267
0
21 Oct 2021
Previous
1
2
3
...
10
8
9
Next