v1v2v3 (latest)

Taming Transformers for High-Resolution Image Synthesis

Computer Vision and Pattern Recognition (CVPR), 2020

17 December 2020

ArXiv (abs)PDF HTML Github (6185★)

Papers citing "Taming Transformers for High-Resolution Image Synthesis"

50 / 2,404 papers shown

Visual Autoregressive Models Beat Diffusion Models on Inference Time Scaling

Erik Riise

Mehmet Onurcan Kaya

Dim P. Papadopoulos

315

19 Oct 2025

SAC: Neural Speech Codec with Semantic-Acoustic Dual-Stream Quantization

...

166

19 Oct 2025

ReCon: Region-Controllable Data Augmentation with Rectification and Alignment for Object Detection

199

17 Oct 2025

Exploring Conditions for Diffusion models in Robotic Control

200

17 Oct 2025

Adapting Self-Supervised Representations as a Latent Space for Efficient Generation

Ming Gui

Johannes Schusterbauer

Timy Phan

Felix Krause

J. Susskind

Miguel Angel Bautista

Bjorn Ommer

204

16 Oct 2025

Vector Quantization in the Brain: Grid-like Codes in World Models

Xiangyuan Peng

Xingsi Dong

Si Wu

143

16 Oct 2025

LightQANet: Quantized and Adaptive Feature Learning for Low-Light Image Enhancement

114

16 Oct 2025

ScaleWeaver: Weaving Efficient Controllable T2I Generation with Multi-Scale Reference Attention

151

16 Oct 2025

Universal Image Restoration Pre-training via Masked Degradation Classification

138

15 Oct 2025

UniCalli: A Unified Diffusion Framework for Column-Level Generation and Recognition of Chinese Calligraphy

15 Oct 2025

Group-Wise Optimization for Self-Extensible Codebooks in Vector Quantized Models

Hong-Kai Zheng

Piji Li

146

15 Oct 2025

Reinforcement Learning Meets Masked Generative Models: Mask-GRPO for Text-to-Image Generation

144

15 Oct 2025

CanvasMAR: Improving Masked Autoregressive Video Generation With Canvas

Zian Li

Muhan Zhang

DiffM VGen

158

15 Oct 2025

End-to-End Multi-Modal Diffusion Mamba

141

15 Oct 2025

NeuroRVQ: Multi-Scale EEG Tokenization for Generative Large Brainwave Models

166

15 Oct 2025

Your VAR Model is Secretly an Efficient and Explainable Generative Classifier

140

14 Oct 2025

BIGFix: Bidirectional Image Generation with Token Fixing

159

14 Oct 2025

What If : Understanding Motion Through Sparse Interactions

138

14 Oct 2025

Self-Supervised Selective-Guided Diffusion Model for Old-Photo Face Restoration

168

14 Oct 2025

Diffusion Transformers with Representation Autoencoders

214

13 Oct 2025

ProteinAE: Protein Diffusion Autoencoders for Structure Encoding

140

12 Oct 2025

Are Video Models Emerging as Zero-Shot Learners and Reasoners in Medical Imaging?

163

11 Oct 2025

Generative Latent Video Compression

160

11 Oct 2025

Lesion-Aware Post-Training of Latent Diffusion Models for Synthesizing Diffusion MRI from CT PerfusionInternational Conference on Medical Image Computing and Computer-Assisted Intervention (MICCAI), 2025

10 Oct 2025

iMoWM: Taming Interactive Multi-Modal World Model for Robotic Manipulation

115

10 Oct 2025

Towards Better & Faster Autoregressive Image Generation: From the Perspective of Entropy

180

10 Oct 2025

Optimal Stopping in Latent Diffusion Models

140

09 Oct 2025

Don't Run with Scissors: Pruning Breaks VLA Models but They Can Be Recovered

142

09 Oct 2025

Ming-UniVision: Joint Image Understanding and Generation with a Unified Continuous Tokenizer

...

162

08 Oct 2025

Heptapod: Language Modeling on Visual Signals

162

08 Oct 2025

Vision-Language-Action Models for Robotics: A Review Towards Real-World ApplicationsIEEE Access (IEEE Access), 2025

305

08 Oct 2025

We Can Hide More Bits: The Unused Watermarking Capacity in Theory and in Practice

155

07 Oct 2025

Efficient Conditional Generation on Scale-based Visual Autoregressive Models

203

07 Oct 2025

Riddled basin geometry sets fundamental limits to predictability and reproducibility in deep learning

Andrew Ly

Pulin Gong

AI4CE

187

07 Oct 2025

Parallel Tokenizers: Rethinking Vocabulary Design for Cross-Lingual Transfer

Muhammad Dehan Al Kautsar

Fajri Koto

214

07 Oct 2025

BlockGPT: Spatio-Temporal Modelling of Rainfall via Frame-Level Autoregression

192

07 Oct 2025

$$\bf{D^3}$QE: Learning Discrete Distribution Discrepancy-aware Quantization Error for Autoregressive-Generated Image Detection$

\bf{D^3}

QE: Learning Discrete Distribution Discrepancy-aware Quantization Error for Autoregressive-Generated Image Detection

193

07 Oct 2025

CodeFormer++: Blind Face Restoration Using Deformable Registration and Deep Metric Learning

Venkata Bharath Reddy Reddem

Akshay P Sarashetti

Ranjith Merugu

Amit Satish Unde

123

06 Oct 2025

VChain: Chain-of-Visual-Thought for Reasoning in Video Generation

06 Oct 2025

REAR: Rethinking Visual Autoregressive Models via Generator-Tokenizer Consistency Regularization

237

06 Oct 2025

Demystifying MaskGIT Sampler and Beyond: Adaptive Order Selection in Masked Diffusion

362

06 Oct 2025

Language Model Based Text-to-Audio Generation: Anti-Causally Aligned Collaborative Residual Transformers

170

06 Oct 2025

SSDD: Single-Step Diffusion Decoder for Efficient Image Tokenization

239

06 Oct 2025

Bridging Text and Video Generation: A Survey

264

06 Oct 2025

MASC: Boosting Autoregressive Image Generation with a Manifold-Aligned Semantic Clustering

Lixuan He

Shikang Zheng

Linfeng Zhang

162

05 Oct 2025

Product-Quantised Image Representation for High-Quality Image Synthesis

Denis Zavadski

Nikita Philip Tatsch

Carsten Rother

107

03 Oct 2025

MelTok: 2D Tokenization for Single-Codebook Audio Compression

311

02 Oct 2025

Growing Visual Generative Capacity for Pre-Trained MLLMs

217

02 Oct 2025

Variational Secret Common Randomness Extraction

115

02 Oct 2025

Ultra-Efficient Decoding for End-to-End Neural Compression and Reconstruction

Ethan G Rogers

Cheng Wang

131

01 Oct 2025