ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2012.09841
  4. Cited By
Taming Transformers for High-Resolution Image Synthesis
v1v2v3 (latest)

Taming Transformers for High-Resolution Image Synthesis

Computer Vision and Pattern Recognition (CVPR), 2020
17 December 2020
Patrick Esser
Robin Rombach
Bjorn Ommer
    ViT
ArXiv (abs)PDFHTMLGithub (6185★)

Papers citing "Taming Transformers for High-Resolution Image Synthesis"

50 / 2,404 papers shown
Visual Autoregressive Models Beat Diffusion Models on Inference Time Scaling
Visual Autoregressive Models Beat Diffusion Models on Inference Time Scaling
Erik Riise
Mehmet Onurcan Kaya
Dim P. Papadopoulos
315
0
0
19 Oct 2025
SAC: Neural Speech Codec with Semantic-Acoustic Dual-Stream Quantization
SAC: Neural Speech Codec with Semantic-Acoustic Dual-Stream Quantization
Wenxi Chen
X. Wang
Ruiqi Yan
Yihao Chen
Zhikang Niu
...
Yuzhe Liang
Hanlin Wen
Shunshun Yin
Ming Tao
Xie Chen
166
3
0
19 Oct 2025
ReCon: Region-Controllable Data Augmentation with Rectification and Alignment for Object Detection
ReCon: Region-Controllable Data Augmentation with Rectification and Alignment for Object Detection
Haowei Zhu
Tianxiang Pan
Rui Qin
Jun-Hai Yong
Bin Wang
DiffM
199
1
0
17 Oct 2025
Exploring Conditions for Diffusion models in Robotic Control
Exploring Conditions for Diffusion models in Robotic Control
Heeseong Shin
Byeongho Heo
Dongyoon Han
Seungryong Kim
Taekyung Kim
200
0
0
17 Oct 2025
Adapting Self-Supervised Representations as a Latent Space for Efficient Generation
Adapting Self-Supervised Representations as a Latent Space for Efficient Generation
Ming Gui
Johannes Schusterbauer
Timy Phan
Felix Krause
J. Susskind
Miguel Angel Bautista
Bjorn Ommer
204
1
0
16 Oct 2025
Vector Quantization in the Brain: Grid-like Codes in World Models
Vector Quantization in the Brain: Grid-like Codes in World Models
Xiangyuan Peng
Xingsi Dong
Si Wu
143
0
0
16 Oct 2025
LightQANet: Quantized and Adaptive Feature Learning for Low-Light Image Enhancement
LightQANet: Quantized and Adaptive Feature Learning for Low-Light Image Enhancement
X. Wu
Zhihui Lai
Xianxu Hou
Jie Zhou
Ya-Nan Zhang
LinLin Shen
114
1
0
16 Oct 2025
ScaleWeaver: Weaving Efficient Controllable T2I Generation with Multi-Scale Reference Attention
ScaleWeaver: Weaving Efficient Controllable T2I Generation with Multi-Scale Reference Attention
Keli Liu
Zhendong Wang
Wengang Zhou
Shaodong Xu
Ruixiao Dong
Houqiang Li
DiffM
151
0
0
16 Oct 2025
Universal Image Restoration Pre-training via Masked Degradation Classification
Universal Image Restoration Pre-training via Masked Degradation Classification
J. Hu
Zhengjian Yao
Lujia Jin
Yinghao Chen
Yanye Lu
138
1
0
15 Oct 2025
UniCalli: A Unified Diffusion Framework for Column-Level Generation and Recognition of Chinese Calligraphy
UniCalli: A Unified Diffusion Framework for Column-Level Generation and Recognition of Chinese Calligraphy
Tianshuo Xu
Kai Wang
Zhifei Chen
Leyi Wu
Tianshui Wen
Fei Chao
Ying-Cong Chen
DiffM
95
0
0
15 Oct 2025
Group-Wise Optimization for Self-Extensible Codebooks in Vector Quantized Models
Group-Wise Optimization for Self-Extensible Codebooks in Vector Quantized Models
Hong-Kai Zheng
Piji Li
146
0
0
15 Oct 2025
Reinforcement Learning Meets Masked Generative Models: Mask-GRPO for Text-to-Image Generation
Reinforcement Learning Meets Masked Generative Models: Mask-GRPO for Text-to-Image Generation
Yifu Luo
Xinhao Hu
Keyu Fan
Haoyuan Sun
Zeyu Chen
Bo Xia
Tiantian Zhang
Yongzhe Chang
Xueqian Wang
144
2
0
15 Oct 2025
CanvasMAR: Improving Masked Autoregressive Video Generation With Canvas
CanvasMAR: Improving Masked Autoregressive Video Generation With Canvas
Zian Li
Muhan Zhang
DiffMVGen
158
0
0
15 Oct 2025
End-to-End Multi-Modal Diffusion Mamba
End-to-End Multi-Modal Diffusion Mamba
Chunhao Lu
Qiang Lu
Meichen Dong
Jake Luo
141
3
0
15 Oct 2025
NeuroRVQ: Multi-Scale EEG Tokenization for Generative Large Brainwave Models
NeuroRVQ: Multi-Scale EEG Tokenization for Generative Large Brainwave Models
Konstantinos Barmpas
Na Lee
Alexandros Koliousis
Yannis Panagakis
Dimitrios A. Adamos
N. Laskaris
Stefanos Zafeiriou
166
1
0
15 Oct 2025
Your VAR Model is Secretly an Efficient and Explainable Generative Classifier
Your VAR Model is Secretly an Efficient and Explainable Generative Classifier
Yi-Chung Chen
David I. Inouye
Jing Gao
DiffMVLM
140
0
0
14 Oct 2025
BIGFix: Bidirectional Image Generation with Token Fixing
BIGFix: Bidirectional Image Generation with Token Fixing
Victor Besnier
David Hurych
Andrei Bursuc
Eduardo Valle
VGen
159
0
0
14 Oct 2025
What If : Understanding Motion Through Sparse Interactions
What If : Understanding Motion Through Sparse Interactions
S. A. Baumann
Nick Stracke
Timy Phan
Bjorn Ommer
138
0
0
14 Oct 2025
Self-Supervised Selective-Guided Diffusion Model for Old-Photo Face Restoration
Self-Supervised Selective-Guided Diffusion Model for Old-Photo Face Restoration
Wenjie Li
Xiangyi Wang
Heng Guo
Guangwei Gao
Zhanyu Ma
DiffM
168
4
0
14 Oct 2025
Diffusion Transformers with Representation Autoencoders
Diffusion Transformers with Representation Autoencoders
Boyang Zheng
Nanye Ma
Shengbang Tong
Saining Xie
DiffM
214
45
0
13 Oct 2025
ProteinAE: Protein Diffusion Autoencoders for Structure Encoding
ProteinAE: Protein Diffusion Autoencoders for Structure Encoding
Shaoning Li
Le Zhuo
Yusong Wang
Mingyu Li
Xinheng He
Fandi Wu
Jiaming Song
Pheng-Ann Heng
DiffM
140
0
0
12 Oct 2025
Are Video Models Emerging as Zero-Shot Learners and Reasoners in Medical Imaging?
Are Video Models Emerging as Zero-Shot Learners and Reasoners in Medical Imaging?
Yuxiang Lai
Jike Zhong
Ming Li
Yuheng Li
Xiaofeng Yang
VGenMedIm
163
3
0
11 Oct 2025
Generative Latent Video Compression
Generative Latent Video Compression
Zongyu Guo
Zhaoyang Jia
Jiahao Li
Xiaoyi Zhang
Bin Li
Yan Lu
VGen
160
1
0
11 Oct 2025
Lesion-Aware Post-Training of Latent Diffusion Models for Synthesizing Diffusion MRI from CT Perfusion
Lesion-Aware Post-Training of Latent Diffusion Models for Synthesizing Diffusion MRI from CT PerfusionInternational Conference on Medical Image Computing and Computer-Assisted Intervention (MICCAI), 2025
J. Lee
Hyunwoong Kim
Hyungjin Chung
Heeseong Eom
Joon Jang
Chul-Ho Sohn
Kyu Sung Choi
MedIm
87
0
0
10 Oct 2025
iMoWM: Taming Interactive Multi-Modal World Model for Robotic Manipulation
iMoWM: Taming Interactive Multi-Modal World Model for Robotic Manipulation
Chuanrui Zhang
Zhengxian Wu
Guanxing Lu
Yansong Tang
Ziwei Wang
VGen
115
0
0
10 Oct 2025
Towards Better & Faster Autoregressive Image Generation: From the Perspective of Entropy
Towards Better & Faster Autoregressive Image Generation: From the Perspective of Entropy
Xiaoxiao Ma
Feng Zhao
Pengyang Ling
Haibo Qiu
Zhixiang Wei
Hu Yu
Jie Huang
Zhixiong Zeng
Lin Ma
180
2
0
10 Oct 2025
Optimal Stopping in Latent Diffusion Models
Optimal Stopping in Latent Diffusion Models
Yu-Han Wu
Quentin Berthet
Gérard Biau
Claire Boyer
Romuald Elie
Pierre Marion
140
0
0
09 Oct 2025
Don't Run with Scissors: Pruning Breaks VLA Models but They Can Be Recovered
Don't Run with Scissors: Pruning Breaks VLA Models but They Can Be Recovered
Jason J. Jabbour
Dong-Ki Kim
Max Smith
Jay Patrikar
Radhika Ghosal
Youhui Wang
Ali Agha
Vijay Janapa Reddi
Shayegan Omidshafiei
VLM
142
1
0
09 Oct 2025
Ming-UniVision: Joint Image Understanding and Generation with a Unified Continuous Tokenizer
Ming-UniVision: Joint Image Understanding and Generation with a Unified Continuous Tokenizer
Ziyuan Huang
Dandan Zheng
Cheng Zou
Rui Liu
Xiaolong Wang
...
Jiajia Liu
Qingpei Guo
Ming-Hsuan Yang
Jingdong Chen
Jun Zhou
162
10
0
08 Oct 2025
Heptapod: Language Modeling on Visual Signals
Heptapod: Language Modeling on Visual Signals
Yongxin Zhu
J. Chen
Yuanzhe Chen
Zhuo Chen
Dongya Jia
Jian Cong
Xiaobin Zhuang
Yuping Wang
Yuping Wang
VLM
162
0
0
08 Oct 2025
Vision-Language-Action Models for Robotics: A Review Towards Real-World Applications
Vision-Language-Action Models for Robotics: A Review Towards Real-World ApplicationsIEEE Access (IEEE Access), 2025
Kento Kawaharazuka
Jihoon Oh
Jun Yamada
Ingmar Posner
Yuke Zhu
LM&Ro
305
30
0
08 Oct 2025
We Can Hide More Bits: The Unused Watermarking Capacity in Theory and in Practice
We Can Hide More Bits: The Unused Watermarking Capacity in Theory and in Practice
Aleksandar Petrov
Pierre Fernandez
Tomáš Souček
Hady ElSahar
155
2
0
07 Oct 2025
Efficient Conditional Generation on Scale-based Visual Autoregressive Models
Efficient Conditional Generation on Scale-based Visual Autoregressive Models
Jiaqi Liu
Tao Huang
Chang Xu
DiffM
203
0
0
07 Oct 2025
Riddled basin geometry sets fundamental limits to predictability and reproducibility in deep learning
Riddled basin geometry sets fundamental limits to predictability and reproducibility in deep learning
Andrew Ly
Pulin Gong
AI4CE
187
0
0
07 Oct 2025
Parallel Tokenizers: Rethinking Vocabulary Design for Cross-Lingual Transfer
Parallel Tokenizers: Rethinking Vocabulary Design for Cross-Lingual Transfer
Muhammad Dehan Al Kautsar
Fajri Koto
214
1
0
07 Oct 2025
BlockGPT: Spatio-Temporal Modelling of Rainfall via Frame-Level Autoregression
BlockGPT: Spatio-Temporal Modelling of Rainfall via Frame-Level Autoregression
Cristian Meo
Varun Sarathchandran
Avijit Majhi
Shao Hung
Carlo Saccardi
R. Imhoff
Roberto Deidda
R. Uijlenhoet
Justin Dauwels
AI4TS
192
0
0
07 Oct 2025
$\bf{D^3}$QE: Learning Discrete Distribution Discrepancy-aware Quantization Error for Autoregressive-Generated Image Detection
D3\bf{D^3}D3QE: Learning Discrete Distribution Discrepancy-aware Quantization Error for Autoregressive-Generated Image Detection
Yanran Zhang
Bingyao Yu
Yu Zheng
Wenzhao Zheng
Yueqi Duan
Lei Chen
Jie Zhou
Jiwen Lu
MQ
193
1
0
07 Oct 2025
CodeFormer++: Blind Face Restoration Using Deformable Registration and Deep Metric Learning
CodeFormer++: Blind Face Restoration Using Deformable Registration and Deep Metric Learning
Venkata Bharath Reddy Reddem
Akshay P Sarashetti
Ranjith Merugu
Amit Satish Unde
123
0
0
06 Oct 2025
VChain: Chain-of-Visual-Thought for Reasoning in Video Generation
VChain: Chain-of-Visual-Thought for Reasoning in Video Generation
Longxiang Zhang
Ning Yu
Gordon Chen
Haonan Qiu
P. Debevec
Ziwei Liu
VGenLRM
87
7
0
06 Oct 2025
REAR: Rethinking Visual Autoregressive Models via Generator-Tokenizer Consistency Regularization
REAR: Rethinking Visual Autoregressive Models via Generator-Tokenizer Consistency Regularization
Qiyuan He
Y. Li
Haotian Ye
Jinghao Wang
Xinyao Liao
Pheng-Ann Heng
Stefano Ermon
James Zou
Angela Yao
DiffMVGen
237
2
0
06 Oct 2025
Demystifying MaskGIT Sampler and Beyond: Adaptive Order Selection in Masked Diffusion
Demystifying MaskGIT Sampler and Beyond: Adaptive Order Selection in Masked Diffusion
Satoshi Hayakawa
Yuhta Takida
Masaaki Imaizumi
Hiromi Wakaki
Yuki Mitsufuji
DiffM
362
0
0
06 Oct 2025
Language Model Based Text-to-Audio Generation: Anti-Causally Aligned Collaborative Residual Transformers
Language Model Based Text-to-Audio Generation: Anti-Causally Aligned Collaborative Residual Transformers
Juncheng Wang
Chao Xu
Cheng Yu
Zhe Hu
Haoyu Xie
Guoqi Yu
Lei Shang
Shujun Wang
DiffM
170
2
0
06 Oct 2025
SSDD: Single-Step Diffusion Decoder for Efficient Image Tokenization
SSDD: Single-Step Diffusion Decoder for Efficient Image Tokenization
Théophane Vallaeys
Jakob Verbeek
Matthieu Cord
DiffM
239
3
0
06 Oct 2025
Bridging Text and Video Generation: A Survey
Bridging Text and Video Generation: A Survey
Nilay Kumar
Priyansh Bhandari
G. Maragatham
VGen
264
0
0
06 Oct 2025
MASC: Boosting Autoregressive Image Generation with a Manifold-Aligned Semantic Clustering
MASC: Boosting Autoregressive Image Generation with a Manifold-Aligned Semantic Clustering
Lixuan He
Shikang Zheng
Linfeng Zhang
162
0
0
05 Oct 2025
Product-Quantised Image Representation for High-Quality Image Synthesis
Product-Quantised Image Representation for High-Quality Image Synthesis
Denis Zavadski
Nikita Philip Tatsch
Carsten Rother
107
0
0
03 Oct 2025
MelTok: 2D Tokenization for Single-Codebook Audio Compression
MelTok: 2D Tokenization for Single-Codebook Audio Compression
Jingyi Li
Zhiyuan Zhao
Yunfei Liu
Lijian Lin
Ye Zhu
Jiahao Wu
Qiuqiang Kong
Yu Li
Y. Li
311
0
0
02 Oct 2025
Growing Visual Generative Capacity for Pre-Trained MLLMs
Growing Visual Generative Capacity for Pre-Trained MLLMs
Hanyu Wang
Jiaming Han
Ziyan Yang
Qi Zhao
Shanchuan Lin
Xiangyu Yue
Abhinav Shrivastava
Zhenheng Yang
Hao Chen
VLM
217
1
0
02 Oct 2025
Variational Secret Common Randomness Extraction
Variational Secret Common Randomness Extraction
Xinyang Li
Amin Seffo
Peter J. Gu
Yiqi Chen
U. Mönich
Holger Boche
115
0
0
02 Oct 2025
Ultra-Efficient Decoding for End-to-End Neural Compression and Reconstruction
Ultra-Efficient Decoding for End-to-End Neural Compression and Reconstruction
Ethan G Rogers
Cheng Wang
131
0
0
01 Oct 2025
Previous
123456...474849
Next
Page 3 of 49
Pageof 49