ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2012.09841
  4. Cited By
Taming Transformers for High-Resolution Image Synthesis
v1v2v3 (latest)

Taming Transformers for High-Resolution Image Synthesis

Computer Vision and Pattern Recognition (CVPR), 2020
17 December 2020
Patrick Esser
Robin Rombach
Bjorn Ommer
    ViT
ArXiv (abs)PDFHTMLGithub (6185★)

Papers citing "Taming Transformers for High-Resolution Image Synthesis"

50 / 2,402 papers shown
Pretraining is All You Need for Image-to-Image Translation
Pretraining is All You Need for Image-to-Image Translation
Tengfei Wang
Ting Zhang
Bo Zhang
Hao Ouyang
Dong Chen
Qifeng Chen
Fang Wen
DiffM
433
198
0
25 May 2022
ASSET: Autoregressive Semantic Scene Editing with Transformers at High
  Resolutions
ASSET: Autoregressive Semantic Scene Editing with Transformers at High ResolutionsACM Transactions on Graphics (TOG), 2022
Difan Liu
Sandesh Shetty
Tobias Hinz
Matthew Fisher
Richard Y. Zhang
Taesung Park
E. Kalogerakis
ViT
193
42
0
24 May 2022
M6-Fashion: High-Fidelity Multi-modal Image Generation and Editing
M6-Fashion: High-Fidelity Multi-modal Image Generation and Editing
Zhikang Li
Huiling Zhou
Shuai Bai
Peike Li
Chang Zhou
Hongxia Yang
184
5
0
24 May 2022
Photorealistic Text-to-Image Diffusion Models with Deep Language
  Understanding
Photorealistic Text-to-Image Diffusion Models with Deep Language UnderstandingNeural Information Processing Systems (NeurIPS), 2022
Chitwan Saharia
William Chan
Saurabh Saxena
Lala Li
Jay Whang
...
Raphael Gontijo-Lopes
Tim Salimans
Jonathan Ho
David J Fleet
Mohammad Norouzi
VLM
1.2K
7,502
0
23 May 2022
Transformer-based out-of-distribution detection for clinically safe
  segmentation
Transformer-based out-of-distribution detection for clinically safe segmentationInternational Conference on Medical Imaging with Deep Learning (MIDL), 2022
M. Graham
Petru-Daniel Tudosiu
P. Wright
W. H. Pinaya
J. U-King-im
...
H. Jäger
D. Werring
P. Nachev
Sebastien Ourselin
M. Jorge Cardoso
MedIm
160
26
0
21 May 2022
UViM: A Unified Modeling Approach for Vision with Learned Guiding Codes
UViM: A Unified Modeling Approach for Vision with Learned Guiding CodesNeural Information Processing Systems (NeurIPS), 2022
Alexander Kolesnikov
André Susano Pinto
Lucas Beyer
Xiaohua Zhai
Jeremiah Harmsen
N. Houlsby
366
80
0
20 May 2022
Towards Unified Keyframe Propagation Models
Towards Unified Keyframe Propagation Models
Patrick Esser
Peter Michael
Soumyadip Sengupta
VGen
127
0
0
19 May 2022
Masked Image Modeling with Denoising Contrast
Masked Image Modeling with Denoising ContrastInternational Conference on Learning Representations (ICLR), 2022
Kun Yi
Yixiao Ge
Xiaotong Li
Shusheng Yang
Dian Li
Jianping Wu
Ying Shan
Xiaohu Qie
VLM
205
65
0
19 May 2022
BBDM: Image-to-image Translation with Brownian Bridge Diffusion Models
BBDM: Image-to-image Translation with Brownian Bridge Diffusion ModelsComputer Vision and Pattern Recognition (CVPR), 2022
Bo Li
Kaitao Xue
Yinan Han
Yunyu Lai
DiffM
408
237
0
16 May 2022
SQ-VAE: Variational Bayes on Discrete Representation with Self-annealed
  Stochastic Quantization
SQ-VAE: Variational Bayes on Discrete Representation with Self-annealed Stochastic QuantizationInternational Conference on Machine Learning (ICML), 2022
Yuhta Takida
Takashi Shibuya
Wei-Hsiang Liao
Chieh-Hsin Lai
Junki Ohmura
Toshimitsu Uesaka
Naoki Murata
Shusuke Takahashi
Toshiyuki Kumakura
Yuki Mitsufuji
BDL
221
90
0
16 May 2022
VQFR: Blind Face Restoration with Vector-Quantized Dictionary and
  Parallel Decoder
VQFR: Blind Face Restoration with Vector-Quantized Dictionary and Parallel DecoderEuropean Conference on Computer Vision (ECCV), 2022
Yuchao Gu
Xintao Wang
Liangbin Xie
Chao Dong
Gengyan Li
Ying Shan
Mingg-Ming Cheng
199
166
0
13 May 2022
StyLandGAN: A StyleGAN based Landscape Image Synthesis using Depth-map
StyLandGAN: A StyleGAN based Landscape Image Synthesis using Depth-map
Gun-Hee Lee
Jonghwa Yim
Chanran Kim
Min-Jung Kim
GANMDE
269
2
0
13 May 2022
Deep Learning and Synthetic Media
Deep Learning and Synthetic Media
Raphaël Millière
186
26
0
11 May 2022
Reduce Information Loss in Transformers for Pluralistic Image Inpainting
Reduce Information Loss in Transformers for Pluralistic Image InpaintingComputer Vision and Pattern Recognition (CVPR), 2022
Qiankun Liu
Zhentao Tan
Dongdong Chen
Qi Chu
Xiyang Dai
Yinpeng Chen
Xiyang Dai
Lu Yuan
Nenghai Yu
ViT
165
86
0
10 May 2022
Spatio-Temporal Transformer for Dynamic Facial Expression Recognition in
  the Wild
Spatio-Temporal Transformer for Dynamic Facial Expression Recognition in the Wild
Fuyan Ma
Bin Sun
Shutao Li
ViT
117
50
0
10 May 2022
NaturalSpeech: End-to-End Text to Speech Synthesis with Human-Level
  Quality
NaturalSpeech: End-to-End Text to Speech Synthesis with Human-Level QualityIEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2022
Xu Tan
Jiawei Chen
Haohe Liu
Jian Cong
Chen Zhang
...
Lei He
Frank Soong
Tao Qin
Sheng Zhao
Tie-Yan Liu
322
290
0
09 May 2022
Seeding Diversity into AI Art
Seeding Diversity into AI ArtInternational Conference on Innovative Computing and Cloud Computing (ICCC), 2022
Marvin Zammit
Antonios Liapis
Georgios N. Yannakakis
138
4
0
02 May 2022
CogView2: Faster and Better Text-to-Image Generation via Hierarchical
  Transformers
CogView2: Faster and Better Text-to-Image Generation via Hierarchical TransformersNeural Information Processing Systems (NeurIPS), 2022
Ming Ding
Wendi Zheng
Wenyi Hong
Jie Tang
VLM
415
395
0
28 Apr 2022
Can deep learning match the efficiency of human visual long-term memory
  in storing object details?
Can deep learning match the efficiency of human visual long-term memory in storing object details?
Emin Orhan
VLMOCL
227
0
0
27 Apr 2022
Conformer and Blind Noisy Students for Improved Image Quality Assessment
Conformer and Blind Noisy Students for Improved Image Quality Assessment
Marcos V. Conde
Maxime Burchi
Radu Timofte
DiffM
168
15
0
27 Apr 2022
An Overview of Recent Work in Media Forensics: Methods and Threats
An Overview of Recent Work in Media Forensics: Methods and Threats
Kratika Bhagtani
A. Yadav
Emily R. Bartusiak
Ziyue Xiang
Ruiting Shao
Sriram Baireddy
Edward J. Delp
AAML
286
28
0
26 Apr 2022
Semi-Parametric Neural Image Synthesis
Semi-Parametric Neural Image Synthesis
A. Blattmann
Robin Rombach
Kaan Oktay
Jonas Muller
Bjorn Ommer
DiffM
300
32
0
25 Apr 2022
Opal: Multimodal Image Generation for News Illustration
Opal: Multimodal Image Generation for News IllustrationACM Symposium on User Interface Software and Technology (UIST), 2022
Vivian Liu
Han Qiao
Lydia B. Chilton
291
120
0
19 Apr 2022
CTCNet: A CNN-Transformer Cooperation Network for Face Image
  Super-Resolution
CTCNet: A CNN-Transformer Cooperation Network for Face Image Super-ResolutionIEEE Transactions on Image Processing (IEEE TIP), 2022
Guangwei Gao
Zixiang Xu
Juncheng Li
Jian Yang
T. Zeng
Guo-Jun Qi
CVBMViTSupR
294
119
0
19 Apr 2022
VQGAN-CLIP: Open Domain Image Generation and Editing with Natural
  Language Guidance
VQGAN-CLIP: Open Domain Image Generation and Editing with Natural Language GuidanceEuropean Conference on Computer Vision (ECCV), 2022
Katherine Crowson
Stella Biderman
Daniel Kornis
Dashiell Stander
Eric Hallahan
Louis Castricato
Edward Raff
CLIP
482
440
0
18 Apr 2022
Imagination-Augmented Natural Language Understanding
Imagination-Augmented Natural Language UnderstandingNorth American Chapter of the Association for Computational Linguistics (NAACL), 2022
Yujie Lu
Wanrong Zhu
Xinze Wang
Miguel P. Eckstein
William Yang Wang
218
25
0
18 Apr 2022
Learning to Listen: Modeling Non-Deterministic Dyadic Facial Motion
Learning to Listen: Modeling Non-Deterministic Dyadic Facial MotionComputer Vision and Pattern Recognition (CVPR), 2022
Evonne Ng
Hanbyul Joo
Liwen Hu
Hao Li
Trevor Darrell
Angjoo Kanazawa
Shiry Ginosar
VGen
205
128
0
18 Apr 2022
Saliency in Augmented Reality
Saliency in Augmented RealityACM Multimedia (ACM MM), 2022
Huiyu Duan
Wei Shen
Xiongkuo Min
Danyang Tu
Jing Li
Guangtao Zhai
128
40
0
18 Apr 2022
Simultaneous Multiple-Prompt Guided Generation Using Differentiable
  Optimal Transport
Simultaneous Multiple-Prompt Guided Generation Using Differentiable Optimal TransportInternational Conference on Innovative Computing and Cloud Computing (ICCC), 2022
Yingtao Tian
Marco Cuturi
David R Ha
DiffMOT
128
1
0
18 Apr 2022
Unconditional Image-Text Pair Generation with Multimodal Cross Quantizer
Unconditional Image-Text Pair Generation with Multimodal Cross QuantizerBritish Machine Vision Conference (BMVC), 2022
Hyungyu Lee
Sungjin Park
Joonseok Lee
Edward Choi
211
4
0
15 Apr 2022
Guided Co-Modulated GAN for 360° Field of View Extrapolation
Guided Co-Modulated GAN for 360° Field of View ExtrapolationInternational Conference on 3D Vision (3DV), 2022
Mohammad Reza Karimi Dastjerdi
Yannick Hold-Geoffroy
Jonathan Eisenmann
Siavash Khodadadeh
Jean-François Lalonde
147
36
0
15 Apr 2022
Any-resolution Training for High-resolution Image Synthesis
Any-resolution Training for High-resolution Image SynthesisEuropean Conference on Computer Vision (ECCV), 2022
Lucy Chai
Michael Gharbi
Eli Shechtman
Phillip Isola
Richard Y. Zhang
235
83
0
14 Apr 2022
An Identity-Preserved Framework for Human Motion Transfer
An Identity-Preserved Framework for Human Motion TransferIEEE Transactions on Information Forensics and Security (IEEE TIFS), 2022
Jingzhe Ma
Xiaoqing Zhang
Shiqi Yu
254
4
0
14 Apr 2022
Hierarchical Text-Conditional Image Generation with CLIP Latents
Hierarchical Text-Conditional Image Generation with CLIP Latents
Aditya A. Ramesh
Prafulla Dhariwal
Alex Nichol
Casey Chu
Mark Chen
VLMDiffM
1.1K
8,304
0
13 Apr 2022
No Token Left Behind: Explainability-Aided Image Classification and
  Generation
No Token Left Behind: Explainability-Aided Image Classification and GenerationEuropean Conference on Computer Vision (ECCV), 2022
Roni Paiss
Hila Chefer
Lior Wolf
VLM
209
35
0
11 Apr 2022
ManiTrans: Entity-Level Text-Guided Image Manipulation via Token-wise
  Semantic Alignment and Generation
ManiTrans: Entity-Level Text-Guided Image Manipulation via Token-wise Semantic Alignment and GenerationComputer Vision and Pattern Recognition (CVPR), 2022
Jianan Wang
Guansong Lu
Hang Xu
Zhenguo Li
Chunjing Xu
Yanwei Fu
246
18
0
09 Apr 2022
Long Video Generation with Time-Agnostic VQGAN and Time-Sensitive
  Transformer
Long Video Generation with Time-Agnostic VQGAN and Time-Sensitive TransformerEuropean Conference on Computer Vision (ECCV), 2022
Songwei Ge
Thomas Hayes
Harry Yang
Xiaoyue Yin
Guan Pang
David Jacobs
Jia-Bin Huang
Devi Parikh
ViT
531
271
0
07 Apr 2022
KNN-Diffusion: Image Generation via Large-Scale Retrieval
KNN-Diffusion: Image Generation via Large-Scale RetrievalInternational Conference on Learning Representations (ICLR), 2022
Shelly Sheynin
Oron Ashual
Adam Polyak
Uriel Singer
Oran Gafni
Eliya Nachmani
Yaniv Taigman
VLMSyDaDiffM
238
147
0
06 Apr 2022
Text2LIVE: Text-Driven Layered Image and Video Editing
Text2LIVE: Text-Driven Layered Image and Video EditingEuropean Conference on Computer Vision (ECCV), 2022
Omer Bar-Tal
Dolev Ofri-Amar
Rafail Fridman
Yoni Kasten
Tali Dekel
VGenDiffM
497
370
0
05 Apr 2022
DT2I: Dense Text-to-Image Generation from Region Descriptions
DT2I: Dense Text-to-Image Generation from Region DescriptionsInternational Conference on Artificial Neural Networks (ICANN), 2022
Stanislav Frolov
Prateek Bansal
Jörn Hees
Andreas Dengel
VLM
165
5
0
05 Apr 2022
Autoregressive 3D Shape Generation via Canonical Mapping
Autoregressive 3D Shape Generation via Canonical MappingEuropean Conference on Computer Vision (ECCV), 2022
A. Cheng
Xueting Li
Sifei Liu
Min Sun
Ming-Hsuan Yang
3DPC
213
47
0
05 Apr 2022
High-Quality Pluralistic Image Completion via Code Shared VQGAN
High-Quality Pluralistic Image Completion via Code Shared VQGAN
Chuanxia Zheng
Guoxian Song
Tat-Jen Cham
Jianfei Cai
Dinh Q. Phung
Linjie Luo
VLM
192
12
0
05 Apr 2022
Quantized GAN for Complex Music Generation from Dance Videos
Quantized GAN for Complex Music Generation from Dance VideosEuropean Conference on Computer Vision (ECCV), 2022
Ye Zhu
Kyle Olszewski
Yuehua Wu
Panos Achlioptas
Menglei Chai
Yan Yan
Sergey Tulyakov
MGen
228
56
0
01 Apr 2022
Perception Prioritized Training of Diffusion Models
Perception Prioritized Training of Diffusion ModelsComputer Vision and Pattern Recognition (CVPR), 2022
Jooyoung Choi
Jungbeom Lee
Chaehun Shin
Sungwon Kim
Hyunwoo J. Kim
Sung-Hoon Yoon
DiffM
296
327
0
01 Apr 2022
Audio-Visual Speech Codecs: Rethinking Audio-Visual Speech Enhancement
  by Re-Synthesis
Audio-Visual Speech Codecs: Rethinking Audio-Visual Speech Enhancement by Re-SynthesisComputer Vision and Pattern Recognition (CVPR), 2022
Karren D. Yang
Dejan Marković
Steven Krenn
Vasu Agrawal
Alexander Richard
VGen
168
45
0
31 Mar 2022
VPTR: Efficient Transformers for Video Prediction
VPTR: Efficient Transformers for Video PredictionInternational Conference on Pattern Recognition (ICPR), 2022
Xi Ye
Guillaume-Alexandre Bilodeau
ViT
233
28
0
29 Mar 2022
mc-BEiT: Multi-choice Discretization for Image BERT Pre-training
mc-BEiT: Multi-choice Discretization for Image BERT Pre-trainingEuropean Conference on Computer Vision (ECCV), 2022
Xiaotong Li
Yixiao Ge
Kun Yi
Zixuan Hu
Ying Shan
Ling-yu Duan
330
44
0
29 Mar 2022
Diverse Plausible 360-Degree Image Outpainting for Efficient 3DCG
  Background Creation
Diverse Plausible 360-Degree Image Outpainting for Efficient 3DCG Background CreationComputer Vision and Pattern Recognition (CVPR), 2022
Naofumi Akimoto
Yuhi Matsuo
Y. Aoki
213
49
0
28 Mar 2022
Fusing Global and Local Features for Generalized AI-Synthesized Image
  Detection
Fusing Global and Local Features for Generalized AI-Synthesized Image DetectionInternational Conference on Information Photonics (ICIP), 2022
Yan Ju
Shan Jia
Lipeng Ke
Hongfei Xue
Koki Nagano
Siwei Lyu
310
105
0
26 Mar 2022
Give Me Your Attention: Dot-Product Attention Considered Harmful for
  Adversarial Patch Robustness
Give Me Your Attention: Dot-Product Attention Considered Harmful for Adversarial Patch RobustnessComputer Vision and Pattern Recognition (CVPR), 2022
Giulio Lovisotto
Nicole Finnie
Mauricio Muñoz
Chaithanya Kumar Mummadi
J. H. Metzen
AAMLViT
138
48
0
25 Mar 2022
Previous
123...444546474849
Next