Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
2012.09841
Cited By
v1
v2
v3 (latest)
Taming Transformers for High-Resolution Image Synthesis
Computer Vision and Pattern Recognition (CVPR), 2020
17 December 2020
Patrick Esser
Robin Rombach
Bjorn Ommer
ViT
Re-assign community
ArXiv (abs)
PDF
HTML
Github (6185★)
Papers citing
"Taming Transformers for High-Resolution Image Synthesis"
50 / 2,404 papers shown
WorldScore: A Unified Evaluation Benchmark for World Generation
Haoyi Duan
Hong-Xing Yu
Sirui Chen
L. Fei-Fei
Jiajun Wu
VGen
402
46
0
01 Apr 2025
Style Quantization for Data-Efficient GAN Training
Computer Vision and Pattern Recognition (CVPR), 2025
Jian Wang
Xin Lan
Jizhe Zhou
Yuxin Tian
Jiancheng Lv
260
2
0
31 Mar 2025
Training-Free Text-Guided Image Editing with Visual Autoregressive Model
Yufei Wang
Lanqing Guo
Zhihao Li
Jiaxing Huang
Pichao Wang
Bihan Wen
Jingchao Wang
DiffM
287
8
0
31 Mar 2025
FastVAR: Linear Visual Autoregressive Modeling via Cached Token Pruning
Hang Guo
Yawei Li
Taolin Zhang
Jiadong Wang
Tao Dai
Shu-Tao Xia
Luca Benini
446
17
0
30 Mar 2025
HiPART: Hierarchical Pose AutoRegressive Transformer for Occluded 3D Human Pose Estimation
Computer Vision and Pattern Recognition (CVPR), 2025
Hongwei Zheng
Han Li
Wenrui Dai
Ziyang Zheng
Chenglin Li
Junni Zou
Hongkai Xiong
3DH
270
6
0
30 Mar 2025
Beyond Synthetic Replays: Turning Diffusion Features into Few-Shot Class-Incremental Learning Knowledge
Junsu Kim
Yunhoe Ku
Dongyoon Han
Seungryul Baek
DiffM
431
1
0
30 Mar 2025
LSNet: See Large, Focus Small
Computer Vision and Pattern Recognition (CVPR), 2025
Ao Wang
Hui Chen
Zijia Lin
Jiawei Han
Guiguang Ding
304
19
0
29 Mar 2025
Exploiting Mixture-of-Experts Redundancy Unlocks Multimodal Generative Abilities
Raman Dutt
Harleen Hanspal
Guoxuan Xia
Petru-Daniel Tudosiu
Alexander Black
Yongxin Yang
Jingyu Sun
Sarah Parisot
MoE
361
0
0
28 Mar 2025
Data Quality Matters: Quantifying Image Quality Impact on Machine Learning Performance
Christian Steinhauser
Philipp Reis
Hubert Padusinski
Jacob Langner
Eric Sax
119
4
0
28 Mar 2025
Arch-LLM: Taming LLMs for Neural Architecture Generation via Unsupervised Discrete Representation Learning
Deshani Geethika Poddenige
Sachith Seneviratne
Damith A. Senanayake
Mahesan Niranjan
PN Suganthan
Saman K. Halgamuge
248
0
0
28 Mar 2025
Evaluating Text-to-Image and Text-to-Video Synthesis with a Conditional Fréchet Distance
Jaywon Koo
J. Hernandez
Moayed Haji-Ali
Ziyan Yang
Vicente Ordonez
EGVM
339
0
0
27 Mar 2025
Harmonizing Visual Representations for Unified Multimodal Understanding and Generation
Size Wu
Feiyu Xiong
Lumin Xu
Sheng Jin
Zhonghua Wu
Qingyi Tao
Wentao Liu
Wei Li
Chen Change Loy
VGen
988
37
0
27 Mar 2025
Can Video Diffusion Model Reconstruct 4D Geometry?
Jinjie Mai
Wenxuan Zhu
Haozhe Liu
Bing Li
Cheng Zheng
Jürgen Schmidhuber
Bernard Ghanem
VGen
MDE
313
7
0
27 Mar 2025
Efficient Multi-Instance Generation with Janus-Pro-Dirven Prompt Parsing
Fan Qi
Yu Duan
Changsheng Xu
DiffM
299
0
0
27 Mar 2025
VBench-2.0: Advancing Video Generation Benchmark Suite for Intrinsic Faithfulness
Dian Zheng
Ziqi Huang
Hongbo Liu
Kai Zou
Yinan He
...
Jingwen He
Wei-Shi Zheng
Botian Shi
Yu Qiao
Ziwei Liu
EGVM
VGen
339
95
0
27 Mar 2025
Beyond Words: Advancing Long-Text Image Generation via Multimodal Autoregressive Models
Alex Jinpeng Wang
Linjie Li
Zhiyong Yang
Lijuan Wang
Min Li
DiffM
290
2
0
26 Mar 2025
MAR-3D: Progressive Masked Auto-regressor for High-Resolution 3D Generation
Computer Vision and Pattern Recognition (CVPR), 2025
Jinnan Chen
Lingting Zhu
Zeyu Hu
Shengju Qian
Yuxiao Chen
Xin Wang
G. Lee
507
7
0
26 Mar 2025
Long-Context Autoregressive Video Modeling with Next-Frame Prediction
Yuchao Gu
Weijia Mao
Mike Zheng Shou
VGen
506
64
0
25 Mar 2025
VTD-CLIP: Video-to-Text Discretization via Prompting CLIP
Wencheng Zhu
Yuexin Wang
Hongxuan Li
Q. Hu
Q. Hu
CLIP
371
1
0
24 Mar 2025
DiffV2IR: Visible-to-Infrared Diffusion Model via Vision-Language Understanding
Lingyan Ran
Lidong Wang
Guangcong Wang
Peng Wang
Yujiao Shi
287
4
0
24 Mar 2025
From Fragment to One Piece: A Survey on AI-Driven Graphic Design
Xingxing Zou
Wen Zhang
Nanxuan Zhao
349
4
0
24 Mar 2025
Uncertainty-guided Perturbation for Image Super-Resolution Diffusion Model
Computer Vision and Pattern Recognition (CVPR), 2025
Leheng Zhang
Weiyi You
Kexuan Shi
Shuhang Gu
361
12
0
24 Mar 2025
Causal Links Between Anthropogenic Emissions and Air Pollution Dynamics in Delhi
Sourish Das
Sudeep Shukla
Alka Yadav
Anirban Chakraborti
AI4CE
209
0
0
24 Mar 2025
Diffusion-4K: Ultra-High-Resolution Image Synthesis with Latent Diffusion Models
Computer Vision and Pattern Recognition (CVPR), 2025
Jinjin Zhang
Qiuyu Huang
Junjie Liu
Xiefan Guo
Di Huang
365
27
0
24 Mar 2025
Panorama Generation From NFoV Image Done Right
Computer Vision and Pattern Recognition (CVPR), 2025
Dian Zheng
Cheng Zhang
Xiao-Ming Wu
Cao Li
Chengfei Lv
Jian-Fang Hu
Wei-Shi Zheng
DiffM
330
7
0
24 Mar 2025
SG-Tailor: Inter-Object Commonsense Relationship Reasoning for Scene Graph Manipulation
Haoliang Shang
Hanyu Wu
Guangyao Zhai
Boyang Sun
Fangjinhua Wang
F. Tombari
Marc Pollefeys
310
1
0
23 Mar 2025
CODA: Repurposing Continuous VAEs for Discrete Tokenization
Zeyu Liu
Zanlin Ni
Yeguo Hua
Xin Deng
Xiao Ma
Cheng Zhong
Gao Huang
313
6
0
22 Mar 2025
DVG-Diffusion: Dual-View Guided Diffusion Model for CT Reconstruction from X-Rays
Xing Xie
Jiawei Liu
Huijie Fan
Zhi Han
Yandong Tang
Liangqiong Qu
DiffM
MedIm
323
2
0
22 Mar 2025
Halton Scheduler For Masked Generative Image Transformer
International Conference on Learning Representations (ICLR), 2025
Victor Besnier
Mickael Chen
David Hurych
Eduardo Valle
Matthieu Cord
278
22
0
21 Mar 2025
ProDehaze: Prompting Diffusion Models Toward Faithful Image Dehazing
Tianwen Zhou
Jing Wang
Songtao Wu
Kuanhong Xu
DiffM
319
0
0
21 Mar 2025
Structure Is Not Enough: Leveraging Behavior for Neural Network Weight Reconstruction
Léo Meynent
Ivan Melev
Konstantin Schurholt
Göran Kauermann
Damian Borth
370
5
0
21 Mar 2025
D2C: Unlocking the Potential of Continuous Autoregressive Image Generation with Discrete Tokens
Panpan Wang
Liqiang Niu
Fandong Meng
Jinan Xu
Yufeng Chen
Jie Zhou
DiffM
318
0
0
21 Mar 2025
Zero-Shot Styled Text Image Generation, but Make It Autoregressive
Computer Vision and Pattern Recognition (CVPR), 2025
Vittorio Pippi
Fabio Quattrini
S. Cascianelli
Alessio Tonioni
Rita Cucchiara
336
9
0
21 Mar 2025
PromptMobile: Efficient Promptus for Low Bandwidth Mobile Video Streaming
Asia-Pacific Workshop on Networking (AN), 2025
Liming Liu
Jiangkai Wu
Haoyang Wang
Peiheng Wang
Xinggong Zhang
Xinggong Zhang
265
0
0
20 Mar 2025
Tokenize Image as a Set
Zigang Geng
Mengde Xu
Han Hu
Shuyang Gu
DiffM
229
0
0
20 Mar 2025
Scale-wise Distillation of Diffusion Models
Nikita Starodubcev
Denis Kuznedelev
Artem Babenko
Dmitry Baranchuk
DiffM
296
4
0
20 Mar 2025
Improving Autoregressive Image Generation through Coarse-to-Fine Token Prediction
Ziyao Guo
Jianchao Tan
Michael Qizhe Shieh
218
5
0
20 Mar 2025
Bridging Continuous and Discrete Tokens for Autoregressive Visual Generation
Yanjie Wang
Zhijie Lin
Yao Teng
Yuanzhi Zhu
Shuhuai Ren
Jiashi Feng
Xihui Liu
405
17
0
20 Mar 2025
LaPIG: Cross-Modal Generation of Paired Thermal and Visible Facial Images
Leyang Wang
Joice Lin
DiffM
274
0
0
20 Mar 2025
Unleashing Vecset Diffusion Model for Fast Shape Generation
Zeqiang Lai
Yunfei Zhao
Zibo Zhao
Haolin Liu
Fuyun Wang
...
Jinwei Huang
Yuhong Liu
Jie Jiang
Chunchao Guo
Xiangyu Yue
DiffM
1.1K
14
0
20 Mar 2025
CAM-Seg: A Continuous-valued Embedding Approach for Semantic Image Generation
Masud Ahmed
Zahid Hasan
Syed Arefinul Haque
A. Faridee
S. Purushotham
Suya You
Nirmalya Roy
458
0
0
19 Mar 2025
The Change You Want To Detect: Semantic Change Detection In Earth Observation With Hybrid Data Generation
Computer Vision and Pattern Recognition (CVPR), 2025
Benidir Yanis
Gonthier Nicolas
Mallet Clement
367
8
0
19 Mar 2025
Forensics-Bench: A Comprehensive Forgery Detection Benchmark Suite for Large Vision Language Models
Computer Vision and Pattern Recognition (CVPR), 2025
Jin Wang
Chenghui Lv
Xian Li
Shichao Dong
Huadong Li
Kelu Yao
Chao Li
Wenqi Shao
Ping Luo
425
10
0
19 Mar 2025
Generating Multimodal Driving Scenes via Next-Scene Prediction
Computer Vision and Pattern Recognition (CVPR), 2025
Yanhao Wu
Haoyang Zhang
Tianwei Lin
Lichao Huang
Shujie Luo
Rui Wu
Congpei Qiu
Wei Ke
Tong Zhang
VGen
337
6
0
19 Mar 2025
3D Engine-ready Photorealistic Avatars via Dynamic Textures
Yifan Wang
Ivan Molodetskikh
Ondrej Texler
Dimitar Dinev
311
0
0
19 Mar 2025
MMAIF: Multi-task and Multi-degradation All-in-One for Image Fusion with Language Guidance
Zihan Cao
Yu Zhong
Liang Luo
Liang-Jian Deng
287
2
0
19 Mar 2025
Conjuring Positive Pairs for Efficient Unification of Representation Learning and Image Synthesis
Imanol G. Estepa
Jesús M. Rodríguez-de-Vera
Ignacio Sarasúa
Bhalaji Nagarajan
Petia Radeva
449
0
0
19 Mar 2025
Exploiting Diffusion Prior for Real-World Image Dehazing with Unpaired Training
AAAI Conference on Artificial Intelligence (AAAI), 2025
Yunwei Lan
Zhigao Cui
Yu Xie
Jialun Peng
Nian Wang
Xin Luo
Dong Liu
DiffM
227
21
0
19 Mar 2025
Learn Your Scales: Towards Scale-Consistent Generative Novel View Synthesis
Fereshteh Forghani
Jason J. Yu
Tristan Aumentado-Armstrong
Konstantinos G. Derpanis
Marcus A. Brubaker
DiffM
339
0
0
19 Mar 2025
Detect-and-Guide: Self-regulation of Diffusion Models for Safe Text-to-Image Generation via Guideline Token Optimization
Computer Vision and Pattern Recognition (CVPR), 2025
Feifei Li
Mi Zhang
Yiming Sun
Min Yang
DiffM
310
6
0
19 Mar 2025
Previous
1
2
3
...
10
11
12
...
47
48
49
Next
Page 11 of 49
Page
of 49
Go