Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
2012.09841
Cited By
v1
v2
v3 (latest)
Taming Transformers for High-Resolution Image Synthesis
Computer Vision and Pattern Recognition (CVPR), 2020
17 December 2020
Patrick Esser
Robin Rombach
Bjorn Ommer
ViT
Re-assign community
ArXiv (abs)
PDF
HTML
Github (6185★)
Papers citing
"Taming Transformers for High-Resolution Image Synthesis"
50 / 2,404 papers shown
Heartcare Suite: A Unified Multimodal ECG Suite for Dual Signal-Image Modeling and Understanding
Yihan Xie
Sijing Li
Tianwei Lin
Zhuonan Wang
Chenglin Yang
...
Haoyuan Li
Hao Jiang
Tai-wei Chang
Qishan Chen
Jun Xiao
299
2
0
24 Dec 2025
DeRA: Decoupled Representation Alignment for Video Tokenization
Pengbo Guo
Junke Wang
Zhen Xing
Chengxu Liu
Daoguo Dong
Xueming Qian
Zuxuan Wu
AI4TS
103
0
0
04 Dec 2025
Semantic-Guided Two-Stage GAN for Face Inpainting with Hybrid Perceptual Encoding
Abhigyan Bhattacharya
Hiranmoy Roy
155
0
0
04 Dec 2025
Efficient Generative Transformer Operators For Million-Point PDEs
Armand K. Koupai
Lise Le Boudec
Patrick Gallinari
78
0
0
04 Dec 2025
Autoregressive Image Generation Needs Only a Few Lines of Cached Tokens
Ziran Qin
Youru Lv
Mingbao Lin
Zeren Zhang
Chanfan Gan
Tieyuan Chen
W. Lin
DiffM
VLM
106
1
0
04 Dec 2025
Rethinking Security in Semantic Communication: Latent Manipulation as a New Threat
Zhiyuan Xi
Kun Zhu
AAML
176
0
0
03 Dec 2025
What Is The Best 3D Scene Representation for Robotics? From Geometric to Foundation Models
Tianchen Deng
Yue Pan
Shenghai Yuan
Dong Li
Chen Wang
...
Danwei W. Wang
Jingchuan Wang
Javier Civera
Hesheng Wang
Weidong Chen
92
7
0
03 Dec 2025
LSRS: Latent Scale Rejection Sampling for Visual Autoregressive Modeling
Hong-Kai Zheng
Piji Li
69
0
0
03 Dec 2025
Hierarchical Process Reward Models are Symbolic Vision Learners
Shan Zhang
Aotian Chen
Kai Zou
Jindong Gu
Yuan Xue
Anton van den Hengel
68
0
0
02 Dec 2025
PixPerfect: Seamless Latent Diffusion Local Editing with Discriminative Pixel-Space Refinement
Haitian Zheng
Yuan Yao
Yongsheng Yu
Yuqian Zhou
Jiebo Luo
Zhe Lin
DiffM
136
1
0
02 Dec 2025
PaCo-RL: Advancing Reinforcement Learning for Consistent Image Generation with Pairwise Reward Modeling
Bowen Ping
Chengyou Jia
Minnan Luo
Changliang Xia
Xin Shen
Zhuohang Dang
Hangwei Qian
EGVM
80
0
0
02 Dec 2025
Mitigating Intra- and Inter-modal Forgetting in Continual Learning of Unified Multimodal Models
Xiwen Wei
Mustafa Munir
R. Marculescu
CLL
282
0
0
02 Dec 2025
Co-speech Gesture Video Generation via Motion-Based Graph Retrieval
Yafei Song
Peng Zhang
Bang Zhang
DiffM
SLR
510
0
0
02 Dec 2025
ViSAudio: End-to-End Video-Driven Binaural Spatial Audio Generation
Mengchen Zhang
Qi Chen
Tong Wu
Zihan Liu
Dahua Lin
VGen
196
1
0
02 Dec 2025
TUNA: Taming Unified Visual Representations for Native Unified Multimodal Models
Zhiheng Liu
Weiming Ren
Haozhe Liu
Zijian Zhou
S. Chen
...
Ping Luo
Wei Liu
Tao Xiang
Jonas Schult
Yuren Cong
166
2
0
01 Dec 2025
Deconstructing Generative Diversity: An Information Bottleneck Analysis of Discrete Latent Generative Models
Yudi Wu
Wenhao Zhao
Dianbo Liu
109
0
0
01 Dec 2025
ManualVLA: A Unified VLA Model for Chain-of-Thought Manual Generation and Robotic Manipulation
Chenyang Gu
Jiaming Liu
Hao Chen
Runzhong Huang
Qingpo Wuwu
...
Ying Li
Renrui Zhang
Peng Jia
Pheng-Ann Heng
Shanghang Zhang
LM&Ro
161
1
0
01 Dec 2025
VQRAE: Representation Quantization Autoencoders for Multimodal Understanding, Generation and Reconstruction
Sinan Du
Jiahao Guo
Bo Li
Shuhao Cui
Zhengzhuo Xu
...
Yongxian Wei
Kun Gai
X. Wang
Kai Wu
C. Yuan
227
1
0
28 Nov 2025
Visual Generation Tuning
Jiahao Guo
Sinan Du
J. Yao
Wenyu Liu
Bo Li
Haoxiang Cao
Kun Gai
C. Yuan
Kai Wu
Xinggang Wang
VLM
306
0
0
28 Nov 2025
REVEAL: Reasoning-enhanced Forensic Evidence Analysis for Explainable AI-generated Image Detection
Huangsen Cao
Qin Mei
Zhiheng Li
Yuxi Li
Ying Zhang
...
Zhimeng Zhang
Xin Ding
Yongwei Wang
Jing Lyu
Fei Wu
138
0
0
28 Nov 2025
Quantized-Tinyllava: a new multimodal foundation model enables efficient split learning
J. Guo
Xin Luo
Jie Liu
Yiqun Wang
Kai-Wei Chang
Wei Wang
Jie Liu
100
0
0
28 Nov 2025
Guiding Visual Autoregressive Models through Spectrum Weakening
Chaoyang Wang
Tianmeng Yang
Jingdong Wang
Yunhai Tong
DiffM
176
0
0
28 Nov 2025
Bringing Your Portrait to 3D Presence
Jiawei Zhang
Lei Chu
Jiahao Li
Zhenyu Zang
Chong Li
Xiao Li
Xun Cao
Hao Zhu
Yan Lu
3DH
245
1
0
27 Nov 2025
Flowing Backwards: Improving Normalizing Flows via Reverse Representation Alignment
Yang Chen
Xiaowei Xu
S. Wang
C. Zhu
Ruxue Wen
X. Li
Tiezheng Ge
Limin Wang
78
0
0
27 Nov 2025
The Collapse of Patches
Wei Guo
Shunqi Mao
Zhuonan Liang
Heng Wang
Weidong Cai
67
0
0
27 Nov 2025
Adversarial Flow Models
Shanchuan Lin
Ceyuan Yang
Zhijie Lin
Hao Chen
Haoqi Fan
GAN
157
0
0
27 Nov 2025
Progress by Pieces: Test-Time Scaling for Autoregressive Image Generation
Joonhyung Park
Hyeongwon Jang
Joowon Kim
Eunho Yang
VLM
159
0
0
26 Nov 2025
DiverseVAR: Balancing Diversity and Quality of Next-Scale Visual Autoregressive Models
Mingue Park
Prin Phunyaphibarn
Phillip Y. Lee
Minhyuk Sung
124
0
0
26 Nov 2025
DINO-Tok: Adapting DINO for Visual Tokenizers
Mingkai Jia
Mingxiao Li
Liaoyuan Fan
Tianxing Shi
Jiaxin Guo
...
Xiaoyang Guo
Xiao-Xiao Long
Qian Zhang
P. Tan
Wei Yin
ViT
201
0
0
25 Nov 2025
PromptMoG: Enhancing Diversity in Long-Prompt Image Generation via Prompt Embedding Mixture-of-Gaussian Sampling
Bo-Kai Ruan
Teng-Fang Hsiao
Ling Lo
Yi-Lun Wu
Hong-Han Shuai
DiffM
VLM
189
0
0
25 Nov 2025
Temporal-Visual Semantic Alignment: A Unified Architecture for Transferring Spatial Priors from Vision Models to Zero-Shot Temporal Tasks
Xiangkai Ma
Han Zhang
Wenzhong Li
Sanglu Lu
AI4TS
VGen
291
0
0
25 Nov 2025
LATTICE: Democratize High-Fidelity 3D Generation at Scale
Zeqiang Lai
Yunfei Zhao
Zibo Zhao
Haolin Liu
Qingxiang Lin
Jingwei Huang
Chunchao Guo
Xiangyu Yue
65
2
0
24 Nov 2025
Understanding, Accelerating, and Improving MeanFlow Training
J. Kim
Hyojun Go
L. Bogensperger
Julius Erbach
Nikolai Kalischek
Federico Tombari
Konrad Schindler
Dominik Narnhofer
AI4CE
237
0
0
24 Nov 2025
FVAR: Visual Autoregressive Modeling via Next Focus Prediction
Xiaofan Li
Chenming Wu
Yanpeng Sun
Jiaming Zhou
Delin Qu
Yansong Qu
Weihao Bo
Haibao Yu
Dingkang Liang
VGen
171
0
0
24 Nov 2025
CoD: A Diffusion Foundation Model for Image Compression
Zhaoyang Jia
Zihan Zheng
Naifu Xue
Jiahao Li
Bin Li
Zongyu Guo
Xiaoyi Zhang
Houqiang Li
Yan Lu
DiffM
379
0
0
24 Nov 2025
MammothModa2: A Unified AR-Diffusion Framework for Multimodal Understanding and Generation
Tao Shen
Xin Wan
Taicai Chen
Rui Zhang
Junwen Pan
...
Y. Yang
Chen Cheng
Qi She
Chang Liu
Zhenbang Sun
DiffM
108
1
0
23 Nov 2025
MINDiff: Mask-Integrated Negative Attention for Controlling Overfitting in Text-to-Image Personalization
Seulgi Jeong
Jaeil Kim
DiffM
144
0
0
22 Nov 2025
Spanning Tree Autoregressive Visual Generation
Sangkyu Lee
Changho Lee
Janghoon Han
Hosung Song
Tackgeun You
Hwasup Lim
Stanley Jungkyu Choi
Honglak Lee
Youngjae Yu
205
0
0
21 Nov 2025
RynnVLA-002: A Unified Vision-Language-Action and World Model
Jun Cen
Siteng Huang
Yuqian Yuan
Kehan Li
Hangjie Yuan
...
Xin Li
Hao Luo
Fan Wang
Deli Zhao
H. Chen
VGen
SyDa
325
2
0
21 Nov 2025
FireScope: Wildfire Risk Prediction with a Chain-of-Thought Oracle
Mario Markov
Stefan Maria Ailuro
Luc Van Gool
Konrad Schindler
D. Paudel
LRM
171
0
0
21 Nov 2025
H-GAR: A Hierarchical Interaction Framework via Goal-Driven Observation-Action Refinement for Robotic Manipulation
Yijie Zhu
Rui Shao
Ziyang Liu
Jie He
Jizhihui Liu
Jiuru Wang
Zitong Yu
222
1
0
21 Nov 2025
AMS-KV: Adaptive KV Caching in Multi-Scale Visual Autoregressive Transformers
Boxun Xu
Yu Wang
Zihu Wang
Peng Li
VLM
297
0
0
20 Nov 2025
Flow and Depth Assisted Video Prediction with Latent Transformer
Eliyas Suleyman
Paul Henderson
Eksan Firkat
Nicolas Pugeault
159
0
0
20 Nov 2025
Progressive Supernet Training for Efficient Visual Autoregressive Modeling
Xiaoyue Chen
Yuling Shi
Kaiyuan Li
Huandong Wang
Yong Li
Xiaodong Gu
Xinlei Chen
Mingbao Lin
110
0
0
20 Nov 2025
LiSTAR: Ray-Centric World Models for 4D LiDAR Sequences in Autonomous Driving
Pei Liu
Songtao Wang
Lang Zhang
Xingyue Peng
Yuandong Lyu
...
Weiliang Ma
Xueyang Zhang
Yifei Zhan
Xianpeng Lang
Jun Ma
SyDa
416
0
0
20 Nov 2025
Decoupling Complexity from Scale in Latent Diffusion Model
Tianxiong Zhong
Xingye Tian
X. Wang
Boyuan Jiang
Xin Tao
Pengfei Wan
DiffM
320
1
0
20 Nov 2025
Masked Auto-Regressive Variational Acceleration: Fast Inference Makes Practical Reinforcement Learning
Yuxuan Gu
Weimin Bai
Yifei Wang
Weijian Luo
H. Sun
DiffM
OffRL
257
0
0
19 Nov 2025
UniHOI: Unified Human-Object Interaction Understanding via Unified Token Space
Panqi Yang
Haodong Jing
Nanning Zheng
Yongqiang Ma
220
0
0
19 Nov 2025
GloTok: Global Perspective Tokenizer for Image Reconstruction and Generation
Xuan Zhao
Zhongyu Zhang
Y. Huang
Yuxi Mi
Guodong Mu
Shouhong Ding
Jun Wang
R. Guo
Shuigeng Zhou
VLM
268
0
0
18 Nov 2025
Towards Deploying VLA without Fine-Tuning: Plug-and-Play Inference-Time VLA Policy Steering via Embodied Evolutionary Diffusion
Zhuo Li
Junjia Liu
Zhipeng Dong
Tao Teng
Quentin Rouxel
D. Caldwell
Fei Chen
110
0
0
18 Nov 2025
1
2
3
4
...
47
48
49
Next
Page 1 of 49
Page
of 49
Go