ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2012.09841
  4. Cited By
Taming Transformers for High-Resolution Image Synthesis

Taming Transformers for High-Resolution Image Synthesis

17 December 2020
Patrick Esser
Robin Rombach
Bjorn Ommer
    ViT
ArXivPDFHTML

Papers citing "Taming Transformers for High-Resolution Image Synthesis"

50 / 476 papers shown
Title
Can deep learning match the efficiency of human visual long-term memory
  in storing object details?
Can deep learning match the efficiency of human visual long-term memory in storing object details?
Emin Orhan
VLM
OCL
20
0
0
27 Apr 2022
Conformer and Blind Noisy Students for Improved Image Quality Assessment
Conformer and Blind Noisy Students for Improved Image Quality Assessment
Marcos V. Conde
Maxime Burchi
Radu Timofte
DiffM
25
13
0
27 Apr 2022
CTCNet: A CNN-Transformer Cooperation Network for Face Image
  Super-Resolution
CTCNet: A CNN-Transformer Cooperation Network for Face Image Super-Resolution
Guangwei Gao
Zixiang Xu
Juncheng Li
Jian Yang
T. Zeng
Guo-Jun Qi
CVBM
ViT
SupR
29
80
0
19 Apr 2022
VQGAN-CLIP: Open Domain Image Generation and Editing with Natural
  Language Guidance
VQGAN-CLIP: Open Domain Image Generation and Editing with Natural Language Guidance
Katherine Crowson
Stella Biderman
Daniel Kornis
Dashiell Stander
Eric Hallahan
Louis Castricato
Edward Raff
CLIP
57
367
0
18 Apr 2022
Hierarchical Text-Conditional Image Generation with CLIP Latents
Hierarchical Text-Conditional Image Generation with CLIP Latents
Aditya A. Ramesh
Prafulla Dhariwal
Alex Nichol
Casey Chu
Mark Chen
VLM
DiffM
67
6,622
0
13 Apr 2022
No Token Left Behind: Explainability-Aided Image Classification and
  Generation
No Token Left Behind: Explainability-Aided Image Classification and Generation
Roni Paiss
Hila Chefer
Lior Wolf
VLM
26
29
0
11 Apr 2022
Long Video Generation with Time-Agnostic VQGAN and Time-Sensitive
  Transformer
Long Video Generation with Time-Agnostic VQGAN and Time-Sensitive Transformer
Songwei Ge
Thomas Hayes
Harry Yang
Xiaoyue Yin
Guan Pang
David Jacobs
Jia-Bin Huang
Devi Parikh
ViT
33
214
0
07 Apr 2022
DT2I: Dense Text-to-Image Generation from Region Descriptions
DT2I: Dense Text-to-Image Generation from Region Descriptions
Stanislav Frolov
Prateek Bansal
Jörn Hees
Andreas Dengel
VLM
19
5
0
05 Apr 2022
Autoregressive 3D Shape Generation via Canonical Mapping
Autoregressive 3D Shape Generation via Canonical Mapping
A. Cheng
Xueting Li
Sifei Liu
Min Sun
Ming Yang
3DPC
37
39
0
05 Apr 2022
Audio-Visual Speech Codecs: Rethinking Audio-Visual Speech Enhancement
  by Re-Synthesis
Audio-Visual Speech Codecs: Rethinking Audio-Visual Speech Enhancement by Re-Synthesis
Karren D. Yang
Dejan Marković
Steven Krenn
Vasu Agrawal
Alexander Richard
VGen
4
32
0
31 Mar 2022
VPTR: Efficient Transformers for Video Prediction
VPTR: Efficient Transformers for Video Prediction
Xi Ye
Guillaume-Alexandre Bilodeau
ViT
19
18
0
29 Mar 2022
mc-BEiT: Multi-choice Discretization for Image BERT Pre-training
mc-BEiT: Multi-choice Discretization for Image BERT Pre-training
Xiaotong Li
Yixiao Ge
Kun Yi
Zixuan Hu
Ying Shan
Ling-yu Duan
16
38
0
29 Mar 2022
Diverse Plausible 360-Degree Image Outpainting for Efficient 3DCG
  Background Creation
Diverse Plausible 360-Degree Image Outpainting for Efficient 3DCG Background Creation
Naofumi Akimoto
Yuhi Matsuo
Y. Aoki
41
34
0
28 Mar 2022
Give Me Your Attention: Dot-Product Attention Considered Harmful for
  Adversarial Patch Robustness
Give Me Your Attention: Dot-Product Attention Considered Harmful for Adversarial Patch Robustness
Giulio Lovisotto
Nicole Finnie
Mauricio Muñoz
Chaithanya Kumar Mummadi
J. H. Metzen
AAML
ViT
17
32
0
25 Mar 2022
Make-A-Scene: Scene-Based Text-to-Image Generation with Human Priors
Make-A-Scene: Scene-Based Text-to-Image Generation with Human Priors
Oran Gafni
Adam Polyak
Oron Ashual
Shelly Sheynin
Devi Parikh
Yaniv Taigman
DiffM
17
510
0
24 Mar 2022
Bailando: 3D Dance Generation by Actor-Critic GPT with Choreographic
  Memory
Bailando: 3D Dance Generation by Actor-Critic GPT with Choreographic Memory
Lian Siyao
Weijiang Yu
Tianpei Gu
Chunze Lin
Quan Wang
Chao Qian
Chen Change Loy
Ziwei Liu
SLR
26
183
0
24 Mar 2022
ViewFormer: NeRF-free Neural Rendering from Few Images Using
  Transformers
ViewFormer: NeRF-free Neural Rendering from Few Images Using Transformers
Jonávs Kulhánek
Erik Derner
Torsten Sattler
Robert Babuvska
ViT
25
73
0
18 Mar 2022
MatchFormer: Interleaving Attention in Transformers for Feature Matching
MatchFormer: Interleaving Attention in Transformers for Feature Matching
Qing Wang
Jiaming Zhang
Kailun Yang
Kunyu Peng
Rainer Stiefelhagen
ViT
31
141
0
17 Mar 2022
DU-VLG: Unifying Vision-and-Language Generation via Dual
  Sequence-to-Sequence Pre-training
DU-VLG: Unifying Vision-and-Language Generation via Dual Sequence-to-Sequence Pre-training
Luyang Huang
Guocheng Niu
Jiachen Liu
Xinyan Xiao
Hua-Hong Wu
VLM
CoGe
14
7
0
17 Mar 2022
The Role of ImageNet Classes in Fréchet Inception Distance
The Role of ImageNet Classes in Fréchet Inception Distance
Tuomas Kynkaanniemi
Tero Karras
M. Aittala
Timo Aila
J. Lehtinen
EGVM
VLM
22
200
0
11 Mar 2022
FlexIT: Towards Flexible Semantic Image Translation
FlexIT: Towards Flexible Semantic Image Translation
Guillaume Couairon
Asya Grechka
Jakob Verbeek
Holger Schwenk
Matthieu Cord
DiffM
31
35
0
09 Mar 2022
Contextformer: A Transformer with Spatio-Channel Attention for Context
  Modeling in Learned Image Compression
Contextformer: A Transformer with Spatio-Channel Attention for Context Modeling in Learned Image Compression
A. B. Koyuncu
Han Gao
Atanas Boev
Georgii Gaikov
Elena Alshina
Eckehard Steinbach
ViT
31
67
0
04 Mar 2022
Autoregressive Image Generation using Residual Quantization
Autoregressive Image Generation using Residual Quantization
Doyup Lee
Chiheon Kim
Saehoon Kim
Minsu Cho
Wook-Shin Han
VGen
168
325
0
03 Mar 2022
Incremental Transformer Structure Enhanced Image Inpainting with Masking
  Positional Encoding
Incremental Transformer Structure Enhanced Image Inpainting with Masking Positional Encoding
Qiaole Dong
Chenjie Cao
Yanwei Fu
CLL
9
137
0
02 Mar 2022
CLIP-GEN: Language-Free Training of a Text-to-Image Generator with CLIP
CLIP-GEN: Language-Free Training of a Text-to-Image Generator with CLIP
Zihao W. Wang
Wei Liu
Qian He
Xin-ru Wu
Zili Yi
CLIP
VLM
182
71
0
01 Mar 2022
Real-World Blind Super-Resolution via Feature Matching with Implicit
  High-Resolution Priors
Real-World Blind Super-Resolution via Feature Matching with Implicit High-Resolution Priors
Chaofeng Chen
Xinyu Shi
Yipeng Qin
Xiaoming Li
Xiaoguang Han
Taojiannan Yang
Shihui Guo
17
113
0
26 Feb 2022
Diffusion bridges vector quantized Variational AutoEncoders
Diffusion bridges vector quantized Variational AutoEncoders
Max H. Cohen
Guillaume Quispe
Sylvain Le Corff
Charles Ollion
Eric Moulines
DiffM
11
13
0
10 Feb 2022
CM3: A Causal Masked Multimodal Model of the Internet
CM3: A Causal Masked Multimodal Model of the Internet
Armen Aghajanyan
Po-Yao (Bernie) Huang
Candace Ross
Vladimir Karpukhin
Hu Xu
...
Dmytro Okhonko
Mandar Joshi
Gargi Ghosh
M. Lewis
Luke Zettlemoyer
15
154
0
19 Jan 2022
RestoreFormer: High-Quality Blind Face Restoration from Undegraded
  Key-Value Pairs
RestoreFormer: High-Quality Blind Face Restoration from Undegraded Key-Value Pairs
Zhouxia Wang
Jiawei Zhang
Runjian Chen
Wenping Wang
Ping Luo
CVBM
13
109
0
17 Jan 2022
Music2Video: Automatic Generation of Music Video with fusion of audio
  and text
Music2Video: Automatic Generation of Music Video with fusion of audio and text
Yoonjeon Kim
Joel Jang
Sumin Shin
DiffM
VGen
13
7
0
11 Jan 2022
ERNIE-ViLG: Unified Generative Pre-training for Bidirectional
  Vision-Language Generation
ERNIE-ViLG: Unified Generative Pre-training for Bidirectional Vision-Language Generation
Han Zhang
Weichong Yin
Yewei Fang
Lanxin Li
Boqiang Duan
Zhihua Wu
Yu Sun
Hao Tian
Hua-Hong Wu
Haifeng Wang
27
58
0
31 Dec 2021
High-Resolution Image Synthesis with Latent Diffusion Models
High-Resolution Image Synthesis with Latent Diffusion Models
Robin Rombach
A. Blattmann
Dominik Lorenz
Patrick Esser
Bjorn Ommer
3DV
83
14,580
0
20 Dec 2021
Text2Mesh: Text-Driven Neural Stylization for Meshes
Text2Mesh: Text-Driven Neural Stylization for Meshes
O. Michel
Roi Bar-On
Richard Liu
Sagie Benaim
Rana Hanocka
CLIP
AI4CE
182
350
0
06 Dec 2021
Global Context with Discrete Diffusion in Vector Quantised Modelling for
  Image Generation
Global Context with Discrete Diffusion in Vector Quantised Modelling for Image Generation
Minghui Hu
Yujie Wang
Tat-Jen Cham
Jianfei Yang
P.N.Suganthan
DiffM
11
40
0
03 Dec 2021
Exploration into Translation-Equivariant Image Quantization
Exploration into Translation-Equivariant Image Quantization
W. Shin
Gyubok Lee
Jiyoung Lee
Eun-Young Lyou
Joonseok Lee
E. Choi
25
7
0
01 Dec 2021
CLIPstyler: Image Style Transfer with a Single Text Condition
CLIPstyler: Image Style Transfer with a Single Text Condition
Gihyun Kwon
Jong Chul Ye
VLM
CLIP
11
240
0
01 Dec 2021
Diffusion Autoencoders: Toward a Meaningful and Decodable Representation
Diffusion Autoencoders: Toward a Meaningful and Decodable Representation
Konpat Preechakul
Nattanat Chatthee
Suttisak Wizadwongsa
Supasorn Suwajanakorn
SyDa
DiffM
27
413
0
30 Nov 2021
EdiBERT, a generative model for image editing
EdiBERT, a generative model for image editing
Thibaut Issenhuth
Ugo Tanielian
Jérémie Mary
David Picard
DiffM
16
12
0
30 Nov 2021
Vector Quantized Diffusion Model for Text-to-Image Synthesis
Vector Quantized Diffusion Model for Text-to-Image Synthesis
Shuyang Gu
Dong Chen
Jianmin Bao
Fang Wen
Bo Zhang
Dongdong Chen
Lu Yuan
B. Guo
DiffM
45
756
0
29 Nov 2021
Blended Diffusion for Text-driven Editing of Natural Images
Blended Diffusion for Text-driven Editing of Natural Images
Omri Avrahami
Dani Lischinski
Ohad Fried
DiffM
11
919
0
29 Nov 2021
SWAT: Spatial Structure Within and Among Tokens
SWAT: Spatial Structure Within and Among Tokens
Kumara Kahatapitiya
Michael S. Ryoo
20
6
0
26 Nov 2021
Scene Representation Transformer: Geometry-Free Novel View Synthesis
  Through Set-Latent Scene Representations
Scene Representation Transformer: Geometry-Free Novel View Synthesis Through Set-Latent Scene Representations
Mehdi S. M. Sajjadi
H. Meyer
Etienne Pot
Urs M. Bergmann
Klaus Greff
...
Daniel Duckworth
Alexey Dosovitskiy
Jakob Uszkoreit
Thomas Funkhouser
Andrea Tagliasacchi
ViT
19
184
0
25 Nov 2021
PeCo: Perceptual Codebook for BERT Pre-training of Vision Transformers
PeCo: Perceptual Codebook for BERT Pre-training of Vision Transformers
Xiaoyi Dong
Jianmin Bao
Ting Zhang
Dongdong Chen
Weiming Zhang
Lu Yuan
Dong Chen
Fang Wen
Nenghai Yu
Baining Guo
ViT
33
238
0
24 Nov 2021
NÜWA: Visual Synthesis Pre-training for Neural visUal World creAtion
NÜWA: Visual Synthesis Pre-training for Neural visUal World creAtion
Chenfei Wu
Jian Liang
Lei Ji
Fan Yang
Yuejian Fang
Daxin Jiang
Nan Duan
ViT
VGen
14
292
0
24 Nov 2021
Discrete Representations Strengthen Vision Transformer Robustness
Discrete Representations Strengthen Vision Transformer Robustness
Chengzhi Mao
Lu Jiang
Mostafa Dehghani
Carl Vondrick
Rahul Sukthankar
Irfan Essa
ViT
25
43
0
20 Nov 2021
LAION-400M: Open Dataset of CLIP-Filtered 400 Million Image-Text Pairs
LAION-400M: Open Dataset of CLIP-Filtered 400 Million Image-Text Pairs
Christoph Schuhmann
Richard Vencu
Romain Beaumont
R. Kaczmarczyk
Clayton Mullis
Aarush Katta
Theo Coombes
J. Jitsev
Aran Komatsuzaki
VLM
MLLM
CLIP
8
1,372
0
03 Nov 2021
Projected GANs Converge Faster
Projected GANs Converge Faster
Axel Sauer
Kashyap Chitta
Jens Muller
Andreas Geiger
32
233
0
01 Nov 2021
Blending Anti-Aliasing into Vision Transformer
Blending Anti-Aliasing into Vision Transformer
Shengju Qian
Hao Shao
Yi Zhu
Mu Li
Jiaya Jia
19
20
0
28 Oct 2021
Telling Creative Stories Using Generative Visual Aids
Telling Creative Stories Using Generative Visual Aids
Safinah Ali
Devi Parikh
11
12
0
27 Oct 2021
Wav2CLIP: Learning Robust Audio Representations From CLIP
Wav2CLIP: Learning Robust Audio Representations From CLIP
Ho-Hsiang Wu
Prem Seetharaman
Kundan Kumar
J. P. Bello
CLIP
VLM
16
267
0
21 Oct 2021
Previous
123...1089
Next