Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
2012.09841
Cited By
v1
v2
v3 (latest)
Taming Transformers for High-Resolution Image Synthesis
Computer Vision and Pattern Recognition (CVPR), 2020
17 December 2020
Patrick Esser
Robin Rombach
Bjorn Ommer
ViT
Re-assign community
ArXiv (abs)
PDF
HTML
Github (6185★)
Papers citing
"Taming Transformers for High-Resolution Image Synthesis"
50 / 2,404 papers shown
StyleSwin: Transformer-based GAN for High-resolution Image Generation
Computer Vision and Pattern Recognition (CVPR), 2021
Bo Zhang
Shuyang Gu
Bo Zhang
Jianmin Bao
Dong Chen
Fang Wen
Yong Wang
B. Guo
ViT
459
293
0
20 Dec 2021
High-Resolution Image Synthesis with Latent Diffusion Models
Computer Vision and Pattern Recognition (CVPR), 2021
Robin Rombach
A. Blattmann
Dominik Lorenz
Patrick Esser
Bjorn Ommer
DiffM
3.1K
21,434
0
20 Dec 2021
Solving Inverse Problems with NerfGANs
Giannis Daras
Wenqing Chu
Abhishek Kumar
Dmitry Lagun
A. Dimakis
3DV
201
6
0
16 Dec 2021
Tackling the Generative Learning Trilemma with Denoising Diffusion GANs
Zhisheng Xiao
Karsten Kreis
Arash Vahdat
DiffM
434
678
0
15 Dec 2021
MAGMA -- Multimodal Augmentation of Generative Models through Adapter-based Finetuning
C. Eichenberg
Sid Black
Samuel Weinbach
Letitia Parcalabescu
Anette Frank
MLLM
VLM
259
110
0
09 Dec 2021
Multimodal Conditional Image Synthesis with Product-of-Experts GANs
Xun Huang
Arun Mallya
Ting-Chun Wang
Xuan Li
DiffM
269
102
0
09 Dec 2021
Text2Mesh: Text-Driven Neural Stylization for Meshes
Computer Vision and Pattern Recognition (CVPR), 2021
O. Michel
Roi Bar-On
Richard Liu
Sagie Benaim
Rana Hanocka
CLIP
AI4CE
1.3K
420
0
06 Dec 2021
Global Context with Discrete Diffusion in Vector Quantised Modelling for Image Generation
Minghui Hu
Yujie Wang
Tat-Jen Cham
Jianfei Yang
P.N.Suganthan
DiffM
174
52
0
03 Dec 2021
Zero-Shot Text-Guided Object Generation with Dream Fields
Ajay Jain
B. Mildenhall
Jonathan T. Barron
Pieter Abbeel
Ben Poole
402
636
0
02 Dec 2021
Exploration into Translation-Equivariant Image Quantization
W. Shin
Gyubok Lee
Jiyoung Lee
Eun-Young Lyou
Joonseok Lee
Edward Choi
216
8
0
01 Dec 2021
CLIPstyler: Image Style Transfer with a Single Text Condition
Gihyun Kwon
Jong Chul Ye
VLM
CLIP
528
317
0
01 Dec 2021
Diffusion Autoencoders: Toward a Meaningful and Decodable Representation
Konpat Preechakul
Nattanat Chatthee
Suttisak Wizadwongsa
Supasorn Suwajanakorn
SyDa
DiffM
427
538
0
30 Nov 2021
EdiBERT, a generative model for image editing
Thibaut Issenhuth
Ugo Tanielian
Jérémie Mary
David Picard
DiffM
342
13
0
30 Nov 2021
Vector Quantized Diffusion Model for Text-to-Image Synthesis
Shuyang Gu
Dong Chen
Jianmin Bao
Fang Wen
Bo Zhang
Dongdong Chen
Lu Yuan
B. Guo
DiffM
579
952
0
29 Nov 2021
Blended Diffusion for Text-driven Editing of Natural Images
Omri Avrahami
Dani Lischinski
Ohad Fried
DiffM
521
1,141
0
29 Nov 2021
SWAT: Spatial Structure Within and Among Tokens
International Joint Conference on Artificial Intelligence (IJCAI), 2021
Kumara Kahatapitiya
Michael S. Ryoo
272
8
0
26 Nov 2021
Scene Representation Transformer: Geometry-Free Novel View Synthesis Through Set-Latent Scene Representations
Computer Vision and Pattern Recognition (CVPR), 2021
Mehdi S. M. Sajjadi
H. Meyer
Etienne Pot
Urs M. Bergmann
Klaus Greff
...
Daniel Duckworth
Alexey Dosovitskiy
Jakob Uszkoreit
Thomas Funkhouser
Andrea Tagliasacchi
ViT
428
230
0
25 Nov 2021
Layered Controllable Video Generation
Jiahui Huang
Yuhe Jin
K. M. Yi
Leonid Sigal
VGen
405
12
0
24 Nov 2021
PeCo: Perceptual Codebook for BERT Pre-training of Vision Transformers
Xiaoyi Dong
Jianmin Bao
Ting Zhang
Dongdong Chen
Weiming Zhang
Lu Yuan
Dong Chen
Fang Wen
Nenghai Yu
Baining Guo
ViT
371
272
0
24 Nov 2021
Unleashing Transformers: Parallel Token Prediction with Discrete Absorbing Diffusion for Fast High-Resolution Image Generation from Vector-Quantized Codes
Sam Bond-Taylor
P. Hessey
Hiroshi Sasaki
T. Breckon
Chris G. Willcocks
DiffM
298
86
0
24 Nov 2021
Octree Transformer: Autoregressive 3D Shape Generation on Hierarchically Structured Sequences
Moritz Ibing
Gregor Kobsik
Leif Kobbelt
217
42
0
24 Nov 2021
NÜWA: Visual Synthesis Pre-training for Neural visUal World creAtion
Chenfei Wu
Jian Liang
Lei Ji
Fan Yang
Yuejian Fang
Daxin Jiang
Nan Duan
ViT
VGen
314
344
0
24 Nov 2021
One to Transfer All: A Universal Transfer Framework for Vision Foundation Model with Few Data
Yujie Wang
Junqin Huang
Mengya Gao
Yichao Wu
Zhen-fei Yin
Ding Liang
Junjie Yan
174
0
0
24 Nov 2021
L-Verse: Bidirectional Generation Between Image and Text
Computer Vision and Pattern Recognition (CVPR), 2021
Taehoon Kim
Gwangmo Song
Sihaeng Lee
Sangyun Kim
Yewon Seo
Soonyoung Lee
S. Kim
Honglak Lee
Kyunghoon Bae
1.0K
28
0
22 Nov 2021
Discrete Representations Strengthen Vision Transformer Robustness
International Conference on Learning Representations (ICLR), 2021
Chengzhi Mao
Lu Jiang
Mostafa Dehghani
Carl Vondrick
Rahul Sukthankar
Irfan Essa
ViT
303
46
0
20 Nov 2021
Compositional Transformers for Scene Generation
Drew A. Hudson
C. L. Zitnick
ViT
263
35
0
17 Nov 2021
INTERN: A New Learning Paradigm Towards General Vision
Jing Shao
Siyu Chen
Yangguang Li
Kun Wang
Zhen-fei Yin
...
F. Yu
Junjie Yan
Dahua Lin
Xiaogang Wang
Yu Qiao
237
39
0
16 Nov 2021
Losses, Dissonances, and Distortions
Pablo Samuel Castro
66
0
0
08 Nov 2021
Improving Visual Quality of Image Synthesis by A Token-based Generator with Transformers
Yanhong Zeng
Huan Yang
Hongyang Chao
Jianbo Wang
Jianlong Fu
ViT
320
30
0
05 Nov 2021
LAION-400M: Open Dataset of CLIP-Filtered 400 Million Image-Text Pairs
Christoph Schuhmann
Richard Vencu
Romain Beaumont
R. Kaczmarczyk
Clayton Mullis
Aarush Katta
Theo Coombes
J. Jitsev
Aran Komatsuzaki
VLM
MLLM
CLIP
865
1,722
0
03 Nov 2021
PatchGame: Learning to Signal Mid-level Patches in Referential Games
Neural Information Processing Systems (NeurIPS), 2021
Kamal Gupta
Gowthami Somepalli
Anubhav Gupta
Vinoj Jayasundara
Matthias Zwicker
Abhinav Shrivastava
188
4
0
02 Nov 2021
Projected GANs Converge Faster
Neural Information Processing Systems (NeurIPS), 2021
Axel Sauer
Kashyap Chitta
Jens Muller
Andreas Geiger
299
284
0
01 Nov 2021
Blending Anti-Aliasing into Vision Transformer
Neural Information Processing Systems (NeurIPS), 2021
Shengju Qian
Hao Shao
Yi Zhu
Mu Li
Jiaya Jia
213
23
0
28 Oct 2021
Telling Creative Stories Using Generative Visual Aids
Safinah Ali
Devi Parikh
79
15
0
27 Oct 2021
Towards artificial general intelligence via a multimodal foundation model
Nanyi Fei
Zhiwu Lu
Yizhao Gao
Guoxing Yang
Yuqi Huo
...
Ruihua Song
Xin Gao
Tao Xiang
Haoran Sun
Jiling Wen
AI4CE
LRM
233
290
0
27 Oct 2021
The Nuts and Bolts of Adopting Transformer in GANs
Rui Xu
Xiangyu Xu
Kai-xiang Chen
Bolei Zhou
Chen Change Loy
ViT
340
4
0
25 Oct 2021
Unsupervised Source Separation By Steering Pretrained Music Models
Ethan Manilow
P. O'Reilly
Prem Seetharaman
Bryan Pardo
183
2
0
25 Oct 2021
Wav2CLIP: Learning Robust Audio Representations From CLIP
IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2021
Ho-Hsiang Wu
Prem Seetharaman
Kundan Kumar
J. P. Bello
CLIP
VLM
346
331
0
21 Oct 2021
3D-RETR: End-to-End Single and Multi-View 3D Reconstruction with Transformers
Z. Shi
Zhao Meng
Yiran Xing
Yunpu Ma
Roger Wattenhofer
ViT
211
37
0
17 Oct 2021
Taming Visually Guided Sound Generation
Vladimir E. Iashin
Esa Rahtu
VLM
320
175
0
17 Oct 2021
AE-StyleGAN: Improved Training of Style-Based Auto-Encoders
Ligong Han
S. Musunuri
Martin Renqiang Min
Ruijiang Gao
Yu Tian
Dimitris N. Metaxas
DRL
162
15
0
17 Oct 2021
Multimodal Dialogue Response Generation
Qingfeng Sun
Yujing Wang
Can Xu
Kai Zheng
Yaming Yang
Huang Hu
Fei Xu
Jessica Zhang
Xiubo Geng
Daxin Jiang
260
52
0
16 Oct 2021
Vector-quantized Image Modeling with Improved VQGAN
International Conference on Learning Representations (ICLR), 2021
Jiahui Yu
Xin Li
Jing Yu Koh
Han Zhang
Ruoming Pang
James Qin
Alexander Ku
Yuanzhong Xu
Jason Baldridge
Yonghui Wu
ViT
VLM
DRL
497
688
0
09 Oct 2021
ATISS: Autoregressive Transformers for Indoor Scene Synthesis
Despoina Paschalidou
Amlan Kar
Maria Shugrina
Karsten Kreis
Andreas Geiger
Sanja Fidler
3DV
ViT
374
208
0
07 Oct 2021
DiffusionCLIP: Text-Guided Diffusion Models for Robust Image Manipulation
Gwanghyun Kim
Taesung Kwon
Jong Chul Ye
DiffM
986
793
0
06 Oct 2021
CLIP-Forge: Towards Zero-Shot Text-to-Shape Generation
Aditya Sanghi
Hang Chu
Joseph G. Lambourne
Ye Wang
Chin-Yi Cheng
Marco Fumero
Kamal Rahimi Malekshan
CLIP
370
344
0
06 Oct 2021
Transformer Assisted Convolutional Network for Cell Instance Segmentation
Deepanshu Pandey
Pradyumna Gupta
Sumit K. Bhattacharya
Aman Sinha
Rohit Agarwal
ViT
MedIm
176
4
0
05 Oct 2021
AffectGAN: Affect-Based Generative Art Driven by Semantics
Theodoros Galanos
Antonios Liapis
Georgios N. Yannakakis
GAN
188
15
0
30 Sep 2021
UFO-ViT: High Performance Linear Vision Transformer without Softmax
Jeonggeun Song
ViT
325
27
0
29 Sep 2021
Resolution-robust Large Mask Inpainting with Fourier Convolutions
Roman Suvorov
Elizaveta Logacheva
Anton Mashikhin
Anastasia Remizova
Arsenii Ashukha
Aleksei Silvestrov
Naejin Kong
Harshith Goka
Kiwoong Park
Victor Lempitsky
343
1,170
0
15 Sep 2021
Previous
1
2
3
...
46
47
48
49
Next
Page 47 of 49
Page
of 49
Go