ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2012.09841
  4. Cited By
Taming Transformers for High-Resolution Image Synthesis
v1v2v3 (latest)

Taming Transformers for High-Resolution Image Synthesis

Computer Vision and Pattern Recognition (CVPR), 2020
17 December 2020
Patrick Esser
Robin Rombach
Bjorn Ommer
    ViT
ArXiv (abs)PDFHTMLGithub (6185★)

Papers citing "Taming Transformers for High-Resolution Image Synthesis"

50 / 2,402 papers shown
Reasoning Physical Video Generation with Diffusion Timestep Tokens via Reinforcement Learning
Reasoning Physical Video Generation with Diffusion Timestep Tokens via Reinforcement Learning
Wang Lin
Liyu Jia
Wentao Hu
Kaihang Pan
Zhongqi Yue
Wei Zhao
Jingyuan Chen
Fei Wu
Hanwang Zhang
VGen
304
8
0
22 Apr 2025
MirrorVerse: Pushing Diffusion Models to Realistically Reflect the World
MirrorVerse: Pushing Diffusion Models to Realistically Reflect the WorldComputer Vision and Pattern Recognition (CVPR), 2025
Tao Lu
Manan Shah
R. V. Babu
297
1
0
21 Apr 2025
Generative Multimodal Pretraining with Discrete Diffusion Timestep Tokens
Generative Multimodal Pretraining with Discrete Diffusion Timestep TokensComputer Vision and Pattern Recognition (CVPR), 2025
Kaihang Pan
Wang Lin
Zhongqi Yue
Tenglong Ao
Liyu Jia
Wei Zhao
Juncheng Billy Li
Siliang Tang
Hanwang Zhang
315
18
0
20 Apr 2025
Turbo2K: Towards Ultra-Efficient and High-Quality 2K Video Synthesis
Turbo2K: Towards Ultra-Efficient and High-Quality 2K Video Synthesis
Jingjing Ren
Wenbo Li
Zhongdao Wang
Haoze Sun
Bangzhen Liu
...
Aoxue Li
Shifeng Zhang
Bin Shao
Yong Guo
Lei Zhu
VGen
313
7
0
20 Apr 2025
The Path to Reconciling Quality and Safety in Text-to-Image Generation: Dataset, Method, and Evaluation
The Path to Reconciling Quality and Safety in Text-to-Image Generation: Dataset, Method, and Evaluation
Shouwei Ruan
Zhenyu Wu
Yao Huang
Ruochen Zhang
Yitong Sun
Caixin Kang
Shiji Zhao
Xingxing Wei
EGVM
406
1
0
19 Apr 2025
Towards Explainable Fake Image Detection with Multi-Modal Large Language Models
Towards Explainable Fake Image Detection with Multi-Modal Large Language Models
Yikun Ji
Y. Hong
Jiahui Zhan
H. Chen
Jun Lan
Huijia Zhu
Weiqiang Wang
Guang Dai
Jianfu Zhang
MLLMLRM
512
4
0
19 Apr 2025
Image Editing with Diffusion Models: A Survey
Image Editing with Diffusion Models: A Survey
Jia Wang
Jie Hu
Xiaoqi Ma
Hanghang Ma
Xiaoming Wei
Enhua Wu
322
5
0
17 Apr 2025
SkyReels-V2: Infinite-length Film Generative Model
SkyReels-V2: Infinite-length Film Generative Model
Guibin Chen
D. Lin
Jiangping Yang
Chunze Lin
J. Zhu
...
Di Qiu
Debang Li
Zhengcong Fei
Yang Li
Yahui Zhou
DiffMVGen
505
76
0
17 Apr 2025
Hierarchical Vector Quantized Graph Autoencoder with Annealing-Based Code Selection
Hierarchical Vector Quantized Graph Autoencoder with Annealing-Based Code SelectionThe Web Conference (WWW), 2025
Long Zeng
Jianxiang Yu
Jiapeng Zhu
Qingsong Zhong
Xiang Li
251
6
0
17 Apr 2025
Autoregressive Distillation of Diffusion Transformers
Autoregressive Distillation of Diffusion TransformersComputer Vision and Pattern Recognition (CVPR), 2025
Yeongmin Kim
Sotiris Anagnostidis
Yuming Du
Edgar Schönfeld
Jonas Kohler
Markos Georgopoulos
Albert Pumarola
Ali K. Thabet
A. Sanakoyeu
309
2
0
15 Apr 2025
Deep Generative Model-Based Generation of Synthetic Individual-Specific Brain MRI Segmentations
Deep Generative Model-Based Generation of Synthetic Individual-Specific Brain MRI Segmentations
Ruijie Wang
Luca Rossetto
Susan Mérillat
Christina Röcke
Mike Martin
Abraham Bernstein
DiffMMedIm
503
0
0
15 Apr 2025
Aligning Generative Denoising with Discriminative Objectives Unleashes Diffusion for Visual Perception
Aligning Generative Denoising with Discriminative Objectives Unleashes Diffusion for Visual PerceptionInternational Conference on Learning Representations (ICLR), 2025
Ziqi Pang
Xin Xu
Yu-Xiong Wang
DiffM
486
1
0
15 Apr 2025
SimpleAR: Pushing the Frontier of Autoregressive Visual Generation through Pretraining, SFT, and RL
SimpleAR: Pushing the Frontier of Autoregressive Visual Generation through Pretraining, SFT, and RL
Junke Wang
Zhi Tian
Xinyu Wang
Xinyu Zhang
Weilin Huang
Zuxuan Wu
Yu Jiang
VGen
408
62
0
15 Apr 2025
Anchor Token Matching: Implicit Structure Locking for Training-free AR Image Editing
Anchor Token Matching: Implicit Structure Locking for Training-free AR Image Editing
Taihang Hu
Linxuan Li
Kai Wang
Yaxing Wang
Jian Yang
Ming-Ming Cheng
DiffMVGen
298
4
0
14 Apr 2025
InstructEngine: Instruction-driven Text-to-Image Alignment
InstructEngine: Instruction-driven Text-to-Image Alignment
Xingyu Lu
Yihan Hu
Yuanxing Zhang
Kaiyu Jiang
Changyi Liu
...
Bin Wen
C. Yuan
Fan Yang
Yan Li
Di Zhang
377
1
0
14 Apr 2025
REPA-E: Unlocking VAE for End-to-End Tuning with Latent Diffusion Transformers
REPA-E: Unlocking VAE for End-to-End Tuning with Latent Diffusion Transformers
Xingjian Leng
Jaskirat Singh
Yunzhong Hou
Zhenchang Xing
Saining Xie
Liang Zheng
410
68
0
14 Apr 2025
D$^2$iT: Dynamic Diffusion Transformer for Accurate Image Generation
D2^22iT: Dynamic Diffusion Transformer for Accurate Image GenerationComputer Vision and Pattern Recognition (CVPR), 2025
Weinan Jia
Mengqi Huang
Nan Chen
Lei Zhang
Zhendong Mao
306
6
0
13 Apr 2025
Generation of Musical Timbres using a Text-Guided Diffusion Model
Generation of Musical Timbres using a Text-Guided Diffusion Model
Weixuan Yuan
Qadeer Khan
Vladimir Golkov
DiffM
223
0
0
12 Apr 2025
Head-Aware KV Cache Compression for Efficient Visual Autoregressive Modeling
Head-Aware KV Cache Compression for Efficient Visual Autoregressive Modeling
Ziran Qin
Youru Lv
Mingbao Lin
Zeren Zhang
Danping Zou
Weiyao Lin
Weiyao Lin
VLM
301
5
0
12 Apr 2025
Diffusion Models for Robotic Manipulation: A Survey
Diffusion Models for Robotic Manipulation: A SurveyFrontiers in Robotics and AI (Front. Robot. AI), 2025
Rosa Wolf
Yitian Shi
Sheng Liu
Rania Rayyes
514
25
0
11 Apr 2025
GigaTok: Scaling Visual Tokenizers to 3 Billion Parameters for Autoregressive Image Generation
GigaTok: Scaling Visual Tokenizers to 3 Billion Parameters for Autoregressive Image Generation
Tianwei Xiong
Jun Hao Liew
Zilong Huang
Jiashi Feng
Xihui Liu
365
22
0
11 Apr 2025
MineWorld: a Real-Time and Open-Source Interactive World Model on Minecraft
MineWorld: a Real-Time and Open-Source Interactive World Model on Minecraft
Junliang Guo
Yang Ye
Tianyu He
Haoyu Wu
Yushu Jiang
Tim Pearce
Li Zhao
VGenSyDa
321
39
0
11 Apr 2025
ZipIR: Latent Pyramid Diffusion Transformer for High-Resolution Image Restoration
ZipIR: Latent Pyramid Diffusion Transformer for High-Resolution Image Restoration
Yongsheng Yu
Haitian Zheng
Zhifei Zhang
Jianming Zhang
Yuqian Zhou
Connelly Barnes
Yixiao Liu
Wei Xiong
Zhe Lin
Jiebo Luo
360
1
0
11 Apr 2025
Latent Diffusion Autoencoders: Toward Efficient and Meaningful Unsupervised Representation Learning in Medical Imaging
Latent Diffusion Autoencoders: Toward Efficient and Meaningful Unsupervised Representation Learning in Medical Imaging
Gabriele Lozupone
Alessandro Bria
F. Fontanella
Frederick J.A. Meijer
C. D. Stefano
Henkjan Huisman
DiffMMedIm
191
2
0
11 Apr 2025
PixelFlow: Pixel-Space Generative Models with Flow
PixelFlow: Pixel-Space Generative Models with Flow
Shoufa Chen
Chongjian Ge
Shilong Zhang
Peize Sun
Ping Luo
VLMDRL
259
17
0
10 Apr 2025
Model Discrepancy Learning: Synthetic Faces Detection Based on Multi-Reconstruction
Model Discrepancy Learning: Synthetic Faces Detection Based on Multi-Reconstruction
Qingchao Jiang
Zhishuo Xu
Zhiying Zhu
Ning Chen
Haoyue Wang
Zhongjie Ba
186
1
0
10 Apr 2025
Domain Generalization via Discrete Codebook Learning
Domain Generalization via Discrete Codebook Learning
Shaocong Long
Qianyu Zhou
Xikun Jiang
Chenhao Ying
Lizhuang Ma
Yuan Luo
247
1
0
09 Apr 2025
A Training-Free Style-aligned Image Generation with Scale-wise Autoregressive Model
A Training-Free Style-aligned Image Generation with Scale-wise Autoregressive Model
Jihun Park
Jongmin Gim
Kyoungmin Lee
Minseok Oh
Minwoo Choi
Jaeyeul Kim
Woo Chool Park
Sunghoon Im
DiffM
268
5
1
08 Apr 2025
CamC2V: Context-aware Controllable Video Generation
CamC2V: Context-aware Controllable Video Generation
Luis Denninger
Sina Mokhtarzadeh Azar
Juergen Gall
VGen
325
0
0
08 Apr 2025
OmniSVG: A Unified Scalable Vector Graphics Generation Model
OmniSVG: A Unified Scalable Vector Graphics Generation Model
Yiying Yang
Wei Cheng
Sijin Chen
Xianfang Zeng
Jiaxu Zhang
Liao Wang
Gang Yu
Jiabo He
Xingjun Ma
Yu Jiang
VLM
517
22
0
08 Apr 2025
Generative Adversarial Networks with Limited Data: A Survey and Benchmarking
Generative Adversarial Networks with Limited Data: A Survey and Benchmarking
Omar de Mitri
Ruyu Wang
Marco F. Huber
289
0
0
07 Apr 2025
Studying Image Diffusion Features for Zero-Shot Video Object Segmentation
Studying Image Diffusion Features for Zero-Shot Video Object Segmentation
Thanos Delatolas
Vicky S. Kalogeiton
Dim P. Papadopoulos
DiffMVOS
335
3
0
07 Apr 2025
FluentLip: A Phonemes-Based Two-stage Approach for Audio-Driven Lip Synthesis with Optical Flow Consistency
FluentLip: A Phonemes-Based Two-stage Approach for Audio-Driven Lip Synthesis with Optical Flow Consistency
Shiyan Liu
Rui Qu
Yan Jin
292
0
0
06 Apr 2025
UniToken: Harmonizing Multimodal Understanding and Generation through Unified Visual Encoding
UniToken: Harmonizing Multimodal Understanding and Generation through Unified Visual Encoding
Yang Jiao
Haibo Qiu
Zequn Jie
Tian Jin
Yue Yu
Lin Ma
Yu Jiang
289
29
0
06 Apr 2025
Walk Before You Dance: High-fidelity and Editable Dance Synthesis via Generative Masked Motion Prior
Walk Before You Dance: High-fidelity and Editable Dance Synthesis via Generative Masked Motion Prior
Foram Niravbhai Shah
Parshwa Shah
Muhammad Usama Saleem
Ekkasit Pinyoanuntapong
Pu Wang
Hongfei Xue
Ahmed Helmy
VGen
696
2
0
06 Apr 2025
Scaling Federated Learning Solutions with Kubernetes for Synthesizing Histopathology Images
Scaling Federated Learning Solutions with Kubernetes for Synthesizing Histopathology Images
Andrei Preda
Iulian-Marius Taiatu
Dumitru-Clementin Cercel
FedMLMedIm
200
1
0
05 Apr 2025
3D Scene Understanding Through Local Random Access Sequence Modeling
3D Scene Understanding Through Local Random Access Sequence Modeling
Wanhee Lee
Klemen Kotar
R. Venkatesh
Jared Watrous
Honglin Chen
Khai Loong Aw
Daniel L. K. Yamins
3DV
240
3
0
04 Apr 2025
SkyReels-A2: Compose Anything in Video Diffusion Transformers
SkyReels-A2: Compose Anything in Video Diffusion Transformers
Zhengcong Fei
Didong Li
Di Qiu
Jiadong Wang
Yikun Dou
...
Jinfeng Xu
Mingyuan Fan
Guibin Chen
Yang Li
Yahui Zhou
DiffMVGen
332
34
0
03 Apr 2025
Moment Quantization for Video Temporal Grounding
Moment Quantization for Video Temporal Grounding
Xiaolong Sun
Le Wang
Sanping Zhou
Liushuai Shi
Kun Xia
Mengnan Liu
Yabing Wang
Gang Hua
MQ
240
1
0
03 Apr 2025
Efficient Autoregressive Shape Generation via Octree-Based Adaptive Tokenization
Efficient Autoregressive Shape Generation via Octree-Based Adaptive Tokenization
Kangle Deng
Hsueh-Ti Derek Liu
Yiheng Zhu
Xiaoxia Sun
Chong Shang
Kiran Bhat
Deva Ramanan
Jun-Yan Zhu
Maneesh Agrawala
Tinghui Zhou
336
2
0
03 Apr 2025
Fine-Tuning Visual Autoregressive Models for Subject-Driven Generation
Fine-Tuning Visual Autoregressive Models for Subject-Driven Generation
Jiwoo Chung
Sangeek Hyun
Hyunjun Kim
Eunseo Koh
MinKyu Lee
Jae-Pil Heo
321
9
0
03 Apr 2025
VARGPT-v1.1: Improve Visual Autoregressive Large Unified Model via Iterative Instruction Tuning and Reinforcement Learning
VARGPT-v1.1: Improve Visual Autoregressive Large Unified Model via Iterative Instruction Tuning and Reinforcement Learning
Xianwei Zhuang
Yuxin Xie
Yufan Deng
Dongchao Yang
Liming Liang
Jinghan Ru
Yuguo Yin
Yuexian Zou
341
15
0
03 Apr 2025
Detecting Lip-Syncing Deepfakes: Vision Temporal Transformer for Analyzing Mouth Inconsistencies
Detecting Lip-Syncing Deepfakes: Vision Temporal Transformer for Analyzing Mouth Inconsistencies
Soumyya Kanti Datta
Shan Jia
Siwei Lyu
272
2
0
02 Apr 2025
FlowR: Flowing from Sparse to Dense 3D Reconstructions
FlowR: Flowing from Sparse to Dense 3D Reconstructions
Tobias Fischer
Samuel Rota Buló
Yung-Hsu Yang
Nikhil Varma Keetha
Lorenzo Porzi
Norman Muller
Katja Schwarz
Jonathon Luiten
Marc Pollefeys
Peter Kontschieder
3DGS
373
7
0
02 Apr 2025
MuTri: Multi-view Tri-alignment for OCT to OCTA 3D Image Translation
MuTri: Multi-view Tri-alignment for OCT to OCTA 3D Image TranslationComputer Vision and Pattern Recognition (CVPR), 2025
Zhaoyu Chen
Hualiang Wang
Chubin Ou
Xiaomeng Li
272
3
0
02 Apr 2025
ILLUME+: Illuminating Unified MLLM with Dual Visual Tokenization and Diffusion Refinement
ILLUME+: Illuminating Unified MLLM with Dual Visual Tokenization and Diffusion Refinement
Runhui Huang
Chunwei Wang
Junwei Yang
Guansong Lu
Yunlong Yuan
...
Lu Hou
Wei Zhang
Lanqing Hong
Hengshuang Zhao
Hang Xu
MLLM
366
31
0
02 Apr 2025
Instruction-Guided Autoregressive Neural Network Parameter Generation
Instruction-Guided Autoregressive Neural Network Parameter Generation
Soro Bedionita
Bruno Andreis
Song Chong
Sung Ju Hwang
DiffM
276
1
0
02 Apr 2025
Learned Image Compression with Dictionary-based Entropy Model
Learned Image Compression with Dictionary-based Entropy ModelComputer Vision and Pattern Recognition (CVPR), 2025
Jingbo Lu
Leheng Zhang
Xingyu Zhou
Mu Li
Wen Li
Shuhang Gu
314
13
0
01 Apr 2025
MergeVQ: A Unified Framework for Visual Generation and Representation with Disentangled Token Merging and Quantization
MergeVQ: A Unified Framework for Visual Generation and Representation with Disentangled Token Merging and QuantizationComputer Vision and Pattern Recognition (CVPR), 2025
Siyuan Li
Guang Dai
Zedong Wang
Juanxi Tian
Cheng Tan
...
Chang Yu
Qingsong Xie
Haonan Lu
Haoqian Wang
Zhen Lei
295
6
0
01 Apr 2025
AP-CAP: Advancing High-Quality Data Synthesis for Animal Pose Estimation via a Controllable Image Generation Pipeline
AP-CAP: Advancing High-Quality Data Synthesis for Animal Pose Estimation via a Controllable Image Generation Pipeline
Lei Wang
Yujie Zhong
Xiaopeng Sun
Jingchun Cheng
C. Feng
Qiong Cao
Lin Ma
Zhaoxin Fan
216
0
0
01 Apr 2025
Previous
123...91011...474849
Next