Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
2012.09841
Cited By
v1
v2
v3 (latest)
Taming Transformers for High-Resolution Image Synthesis
Computer Vision and Pattern Recognition (CVPR), 2020
17 December 2020
Patrick Esser
Robin Rombach
Bjorn Ommer
ViT
Re-assign community
ArXiv (abs)
PDF
HTML
Github (6185★)
Papers citing
"Taming Transformers for High-Resolution Image Synthesis"
50 / 2,374 papers shown
Title
Real-Time Person Image Synthesis Using a Flow Matching Model
Jiwoo Jeong
Kirok Kim
Wooju Kim
Nam-Joon Kim
3DH
198
0
0
06 May 2025
Unified Multimodal Understanding and Generation Models: Advances, Challenges, and Opportunities
Wei Wei
Jintao Guo
Shanshan Zhao
Minghao Fu
Lunhao Duan
...
Guo-Hua Wang
Qing-Guo Chen
Zhao Xu
Weihua Luo
Kaifu Zhang
DiffM
999
26
0
05 May 2025
RobSurv: Vector Quantization-Based Multi-Modal Learning for Robust Cancer Survival Prediction
Aiman Farooq
Azad Singh
Deepak Mishra
S. Chaudhury
153
0
0
05 May 2025
Any-to-Any Vision-Language Model for Multimodal X-ray Imaging and Radiological Report Generation
Daniele Molino
Francesco Di Feola
Linlin Shen
Paolo Soda
V. Guarrasi
MedIm
LM&MA
222
2
0
02 May 2025
T2I-R1: Reinforcing Image Generation with Collaborative Semantic-level and Token-level CoT
Shihong Deng
Ziyu Guo
Renrui Zhang
Zhuofan Zong
Hao Li
Le Zhuo
Shilin Yan
Pheng-Ann Heng
Haoyang Li
LRM
396
74
0
01 May 2025
Can We Achieve Efficient Diffusion without Self-Attention? Distilling Self-Attention into Convolutions
Ziyi Dong
Chengxing Zhou
Weijian Deng
Pengxu Wei
Xiangyang Ji
Guanbin Li
MQ
239
0
0
30 Apr 2025
Embracing Collaboration Over Competition: Condensing Multiple Prompts for Visual In-Context Learning
Computer Vision and Pattern Recognition (CVPR), 2025
Jiadong Wang
Tianci Luo
Yaohua Zha
Yan Feng
Ruisheng Luo
Bin Chen
Tao Dai
Long Chen
Yaowei Wang
Shu-Tao Xia
VLM
232
0
0
30 Apr 2025
Revisiting Diffusion Autoencoder Training for Image Reconstruction Quality
Pramook Khungurn
Sukit Seripanitkarn
Phonphrm Thawatdamrongkit
Supasorn Suwajanakorn
DiffM
299
0
0
30 Apr 2025
AGHI-QA: A Subjective-Aligned Dataset and Metric for AI-Generated Human Images
Yunhao Li
Sijing Wu
Wei Sun
Zhichao Zhang
Yucheng Zhu
Zicheng Zhang
Huiyu Duan
Xiongkuo Min
Guangtao Zhai
EGVM
234
7
0
30 Apr 2025
Why Compress What You Can Generate? When GPT-4o Generation Ushers in Image Compression Fields
Yixin Gao
Xiaohan Pan
Xiaochen Li
Zhibo Chen
198
1
0
30 Apr 2025
GarmentX: Autoregressive Parametric Representations for High-Fidelity 3D Garment Generation
Jingfeng Guo
Jianfei Chen
Weikai Chen
Zhenyu Sun
Lanjiong Li
Baozhu Zhao
Lingting Zhu
Xinyu Wang
Qi Liu
3DH
325
0
0
29 Apr 2025
EarthMapper: Visual Autoregressive Models for Controllable Bidirectional Satellite-Map Translation
Zhe Dong
Yuzhe Sun
Tianzhu Liu
Wangmeng Zuo
Yanfeng Gu
182
1
0
28 Apr 2025
RadioFormer: A Multiple-Granularity Radio Map Estimation Transformer with 1\textpertenthousand Spatial Sampling
Zheng Fang
Kangjun Liu
Ke Chen
Qingyu Liu
Junxuan Zhang
Lingyang Song
Yaowei Wang
217
1
0
27 Apr 2025
REED-VAE: RE-Encode Decode Training for Iterative Image Editing with Diffusion Models
Gal Almog
Ariel Shamir
Ohad Fried
DiffM
199
1
0
26 Apr 2025
E-InMeMo: Enhanced Prompting for Visual In-Context Learning
Journal of Imaging (JI), 2025
Jiahao Zhang
Bowen Wang
Hong Liu
Liangzhi Li
Yuta Nakashima
Hajime Nagahara
VLM
326
1
0
25 Apr 2025
Enhancing Variational Autoencoders with Smooth Robust Latent Encoding
Hyomin Lee
Minseon Kim
Sangwon Jang
Jongheon Jeong
Sung Ju Hwang
DiffM
AAML
176
4
0
24 Apr 2025
Token-Shuffle: Towards High-Resolution Image Generation with Autoregressive Models
Xu Ma
Peize Sun
Haoyu Ma
Hao Tang
Chih-Yao Ma
...
Matt Feiszli
Peizhao Zhang
Peter Vajda
Sam S. Tsai
Y. Fu
412
11
0
24 Apr 2025
Fast Autoregressive Models for Continuous Latent Generation
Tiankai Hang
Jianmin Bao
Fangyun Wei
Dong Chen
DiffM
187
3
0
24 Apr 2025
Dual Prompting Image Restoration with Diffusion Transformers
Computer Vision and Pattern Recognition (CVPR), 2025
Dehong Kong
Fan Li
Zhixin Wang
Jiaqi Xu
Renjing Pei
Wenbo Li
Wenqi Ren
DiffM
256
5
0
24 Apr 2025
DIVE: Inverting Conditional Diffusion Models for Discriminative Tasks
IEEE transactions on multimedia (TMM), 2025
Yinqi Li
Hong Chang
Ruibing Hou
Shiguang Shan
Xilin Chen
DiffM
259
1
0
24 Apr 2025
Hyper-Transforming Latent Diffusion Models
I. Peis
Batuhan Koyuncu
Isabel Valera
J. Frellsen
371
1
0
23 Apr 2025
Distilling semantically aware orders for autoregressive image generation
Rishav Pramanik
Antoine Poupon
Juan A. Rodriguez
Masih Aminbeidokhti
David Vazquez
Christopher Pal
Zhaozheng Yin
M. Pedersoli
236
0
0
23 Apr 2025
Reasoning Physical Video Generation with Diffusion Timestep Tokens via Reinforcement Learning
Wang Lin
Liyu Jia
Wentao Hu
Kaihang Pan
Zhongqi Yue
Wei Zhao
Jingyuan Chen
Fei Wu
Hanwang Zhang
VGen
226
8
0
22 Apr 2025
MirrorVerse: Pushing Diffusion Models to Realistically Reflect the World
Computer Vision and Pattern Recognition (CVPR), 2025
Ankit Dhiman
Manan Shah
R. V. Babu
218
1
0
21 Apr 2025
Turbo2K: Towards Ultra-Efficient and High-Quality 2K Video Synthesis
Jingjing Ren
Wenbo Li
Zhongdao Wang
Haoze Sun
Bangzhen Liu
...
Aoxue Li
Shifeng Zhang
Bin Shao
Yong Guo
Lei Zhu
VGen
233
6
0
20 Apr 2025
Generative Multimodal Pretraining with Discrete Diffusion Timestep Tokens
Computer Vision and Pattern Recognition (CVPR), 2025
Kaihang Pan
Wang Lin
Zhongqi Yue
Tenglong Ao
Liyu Jia
Wei Zhao
Juncheng Billy Li
Siliang Tang
Hanwang Zhang
228
15
0
20 Apr 2025
The Path to Reconciling Quality and Safety in Text-to-Image Generation: Dataset, Method, and Evaluation
Shouwei Ruan
Zhenyu Wu
Yao Huang
Ruochen Zhang
Yitong Sun
Caixin Kang
Shiji Zhao
Xingxing Wei
EGVM
383
1
0
19 Apr 2025
Towards Explainable Fake Image Detection with Multi-Modal Large Language Models
Yikun Ji
Y. Hong
Jiahui Zhan
H. Chen
Jun Lan
Huijia Zhu
Weiqiang Wang
Guang Dai
Jianfu Zhang
MLLM
LRM
420
4
0
19 Apr 2025
Hierarchical Vector Quantized Graph Autoencoder with Annealing-Based Code Selection
The Web Conference (WWW), 2025
Long Zeng
Jianxiang Yu
Jiapeng Zhu
Qingsong Zhong
Xiang Li
191
6
0
17 Apr 2025
Image Editing with Diffusion Models: A Survey
Jia Wang
Jie Hu
Xiaoqi Ma
Hanghang Ma
Xiaoming Wei
Enhua Wu
267
4
0
17 Apr 2025
SkyReels-V2: Infinite-length Film Generative Model
Guibin Chen
D. Lin
Jiangping Yang
Chunze Lin
J. Zhu
...
Di Qiu
Debang Li
Zhengcong Fei
Yang Li
Yahui Zhou
DiffM
VGen
414
64
0
17 Apr 2025
Deep Generative Model-Based Generation of Synthetic Individual-Specific Brain MRI Segmentations
Ruijie Wang
Luca Rossetto
Susan Mérillat
Christina Röcke
Mike Martin
Abraham Bernstein
DiffM
MedIm
448
0
0
15 Apr 2025
Aligning Generative Denoising with Discriminative Objectives Unleashes Diffusion for Visual Perception
International Conference on Learning Representations (ICLR), 2025
Ziqi Pang
Xin Xu
Yu-Xiong Wang
DiffM
441
1
0
15 Apr 2025
SimpleAR: Pushing the Frontier of Autoregressive Visual Generation through Pretraining, SFT, and RL
Junke Wang
Zhi Tian
Xinyu Wang
Xinyu Zhang
Weilin Huang
Zuxuan Wu
Yu Jiang
VGen
354
52
0
15 Apr 2025
Autoregressive Distillation of Diffusion Transformers
Computer Vision and Pattern Recognition (CVPR), 2025
Yeongmin Kim
Sotiris Anagnostidis
Yuming Du
Edgar Schönfeld
Jonas Kohler
Markos Georgopoulos
Albert Pumarola
Ali K. Thabet
A. Sanakoyeu
242
2
0
15 Apr 2025
REPA-E: Unlocking VAE for End-to-End Tuning with Latent Diffusion Transformers
Xingjian Leng
Jaskirat Singh
Yunzhong Hou
Zhenchang Xing
Saining Xie
Liang Zheng
292
61
0
14 Apr 2025
InstructEngine: Instruction-driven Text-to-Image Alignment
Xingyu Lu
Yihan Hu
Yuanxing Zhang
Kaiyu Jiang
Changyi Liu
...
Bin Wen
C. Yuan
Fan Yang
Yan Li
Di Zhang
284
0
0
14 Apr 2025
Anchor Token Matching: Implicit Structure Locking for Training-free AR Image Editing
Taihang Hu
Linxuan Li
Kai Wang
Yaxing Wang
Jian Yang
Ming-Ming Cheng
DiffM
VGen
258
4
0
14 Apr 2025
D
2
^2
2
iT: Dynamic Diffusion Transformer for Accurate Image Generation
Computer Vision and Pattern Recognition (CVPR), 2025
Weinan Jia
Mengqi Huang
Nan Chen
Lei Zhang
Zhendong Mao
262
6
0
13 Apr 2025
Generation of Musical Timbres using a Text-Guided Diffusion Model
Weixuan Yuan
Qadeer Khan
Vladimir Golkov
DiffM
172
0
0
12 Apr 2025
Head-Aware KV Cache Compression for Efficient Visual Autoregressive Modeling
Ziran Qin
Youru Lv
Mingbao Lin
Zeren Zhang
Danping Zou
Weiyao Lin
Weiyao Lin
VLM
241
5
0
12 Apr 2025
GigaTok: Scaling Visual Tokenizers to 3 Billion Parameters for Autoregressive Image Generation
Tianwei Xiong
Jun Hao Liew
Zilong Huang
Jiashi Feng
Xihui Liu
268
17
0
11 Apr 2025
ZipIR: Latent Pyramid Diffusion Transformer for High-Resolution Image Restoration
Yongsheng Yu
Haitian Zheng
Zhifei Zhang
Jianming Zhang
Yuqian Zhou
Connelly Barnes
Yixiao Liu
Wei Xiong
Zhe Lin
Jiebo Luo
295
1
0
11 Apr 2025
Latent Diffusion Autoencoders: Toward Efficient and Meaningful Unsupervised Representation Learning in Medical Imaging
Gabriele Lozupone
Alessandro Bria
F. Fontanella
Frederick J.A. Meijer
C. D. Stefano
Henkjan Huisman
DiffM
MedIm
123
0
0
11 Apr 2025
Diffusion Models for Robotic Manipulation: A Survey
Frontiers in Robotics and AI (Front. Robot. AI), 2025
Rosa Wolf
Yitian Shi
Sheng Liu
Rania Rayyes
411
23
0
11 Apr 2025
MineWorld: a Real-Time and Open-Source Interactive World Model on Minecraft
Junliang Guo
Yang Ye
Tianyu He
Haoyu Wu
Yushu Jiang
Tim Pearce
Li Zhao
VGen
SyDa
265
33
0
11 Apr 2025
Model Discrepancy Learning: Synthetic Faces Detection Based on Multi-Reconstruction
Qingchao Jiang
Zhishuo Xu
Zhiying Zhu
Ning Chen
Haoyue Wang
Zhongjie Ba
143
0
0
10 Apr 2025
PixelFlow: Pixel-Space Generative Models with Flow
Shoufa Chen
Chongjian Ge
Shilong Zhang
Peize Sun
Ping Luo
VLM
DRL
206
15
0
10 Apr 2025
Domain Generalization via Discrete Codebook Learning
Shaocong Long
Qianyu Zhou
Xikun Jiang
Chenhao Ying
Lizhuang Ma
Yuan Luo
174
1
0
09 Apr 2025
OmniSVG: A Unified Scalable Vector Graphics Generation Model
Yiying Yang
Wei Cheng
Sijin Chen
Xianfang Zeng
Jiaxu Zhang
Liao Wang
Gang Yu
Jiabo He
Xingjun Ma
Yu Jiang
VLM
417
20
0
08 Apr 2025
Previous
1
2
3
...
8
9
10
...
46
47
48
Next