ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2012.09841
  4. Cited By
Taming Transformers for High-Resolution Image Synthesis
v1v2v3 (latest)

Taming Transformers for High-Resolution Image Synthesis

Computer Vision and Pattern Recognition (CVPR), 2020
17 December 2020
Patrick Esser
Robin Rombach
Bjorn Ommer
    ViT
ArXiv (abs)PDFHTMLGithub (6185★)

Papers citing "Taming Transformers for High-Resolution Image Synthesis"

50 / 2,374 papers shown
Title
Progress by Pieces: Test-Time Scaling for Autoregressive Image Generation
Progress by Pieces: Test-Time Scaling for Autoregressive Image Generation
Joonhyung Park
Hyeongwon Jang
Joowon Kim
Eunho Yang
VLM
76
0
0
26 Nov 2025
DiverseVAR: Balancing Diversity and Quality of Next-Scale Visual Autoregressive Models
DiverseVAR: Balancing Diversity and Quality of Next-Scale Visual Autoregressive Models
Mingue Park
Prin Phunyaphibarn
Phillip Y. Lee
Minhyuk Sung
76
0
0
26 Nov 2025
Temporal-Visual Semantic Alignment: A Unified Architecture for Transferring Spatial Priors from Vision Models to Zero-Shot Temporal Tasks
Temporal-Visual Semantic Alignment: A Unified Architecture for Transferring Spatial Priors from Vision Models to Zero-Shot Temporal Tasks
Xiangkai Ma
Han Zhang
Wenzhong Li
Sanglu Lu
AI4TSVGen
179
0
0
25 Nov 2025
DINO-Tok: Adapting DINO for Visual Tokenizers
DINO-Tok: Adapting DINO for Visual Tokenizers
Mingkai Jia
Mingxiao Li
Liaoyuan Fan
Tianxing Shi
Jiaxin Guo
...
Xiaoyang Guo
Xiao-Xiao Long
Qian Zhang
P. Tan
Wei Yin
ViT
120
0
0
25 Nov 2025
PromptMoG: Enhancing Diversity in Long-Prompt Image Generation via Prompt Embedding Mixture-of-Gaussian Sampling
PromptMoG: Enhancing Diversity in Long-Prompt Image Generation via Prompt Embedding Mixture-of-Gaussian Sampling
Bo-Kai Ruan
Teng-Fang Hsiao
Ling Lo
Yi-Lun Wu
Hong-Han Shuai
DiffMVLM
115
0
0
25 Nov 2025
Understanding, Accelerating, and Improving MeanFlow Training
Understanding, Accelerating, and Improving MeanFlow Training
J. Kim
Hyojun Go
L. Bogensperger
Julius Erbach
Nikolai Kalischek
Federico Tombari
Konrad Schindler
Dominik Narnhofer
AI4CE
159
0
0
24 Nov 2025
FVAR: Visual Autoregressive Modeling via Next Focus Prediction
FVAR: Visual Autoregressive Modeling via Next Focus Prediction
Xiaofan Li
Chenming Wu
Yanpeng Sun
Jiaming Zhou
Delin Qu
Yansong Qu
Weihao Bo
Haibao Yu
Dingkang Liang
VGen
84
0
0
24 Nov 2025
CoD: A Diffusion Foundation Model for Image Compression
CoD: A Diffusion Foundation Model for Image Compression
Zhaoyang Jia
Zihan Zheng
Naifu Xue
Jiahao Li
Bin Li
Zongyu Guo
Xiaoyi Zhang
Houqiang Li
Yan Lu
DiffM
216
0
0
24 Nov 2025
MammothModa2: A Unified AR-Diffusion Framework for Multimodal Understanding and Generation
MammothModa2: A Unified AR-Diffusion Framework for Multimodal Understanding and Generation
Tao Shen
Xin Wan
Taicai Chen
Rui Zhang
Junwen Pan
...
Y. Yang
Chen Cheng
Qi She
Chang Liu
Zhenbang Sun
DiffM
28
0
0
23 Nov 2025
MINDiff: Mask-Integrated Negative Attention for Controlling Overfitting in Text-to-Image Personalization
MINDiff: Mask-Integrated Negative Attention for Controlling Overfitting in Text-to-Image Personalization
Seulgi Jeong
Jaeil Kim
DiffM
60
0
0
22 Nov 2025
FireScope: Wildfire Risk Prediction with a Chain-of-Thought Oracle
FireScope: Wildfire Risk Prediction with a Chain-of-Thought Oracle
Mario Markov
Stefan Maria Ailuro
Luc Van Gool
Konrad Schindler
D. Paudel
LRM
59
0
0
21 Nov 2025
H-GAR: A Hierarchical Interaction Framework via Goal-Driven Observation-Action Refinement for Robotic Manipulation
H-GAR: A Hierarchical Interaction Framework via Goal-Driven Observation-Action Refinement for Robotic Manipulation
Yijie Zhu
Rui Shao
Ziyang Liu
Jie He
Jizhihui Liu
Jiuru Wang
Zitong Yu
122
1
0
21 Nov 2025
RynnVLA-002: A Unified Vision-Language-Action and World Model
RynnVLA-002: A Unified Vision-Language-Action and World Model
Jun Cen
Siteng Huang
Yuqian Yuan
Kehan Li
Hangjie Yuan
...
Xin Li
Hao Luo
Fan Wang
Deli Zhao
H. Chen
VGenSyDa
233
0
0
21 Nov 2025
Spanning Tree Autoregressive Visual Generation
Spanning Tree Autoregressive Visual Generation
Sangkyu Lee
Changho Lee
Janghoon Han
Hosung Song
Tackgeun You
Hwasup Lim
Stanley Jungkyu Choi
Honglak Lee
Youngjae Yu
168
0
0
21 Nov 2025
Progressive Supernet Training for Efficient Visual Autoregressive Modeling
Xiaoyue Chen
Yuling Shi
Kaiyuan Li
Huandong Wang
Yong Li
Xiaodong Gu
Xinlei Chen
Mingbao Lin
48
0
0
20 Nov 2025
Flow and Depth Assisted Video Prediction with Latent Transformer
Eliyas Suleyman
Paul Henderson
Eksan Firkat
Nicolas Pugeault
70
0
0
20 Nov 2025
Decoupling Complexity from Scale in Latent Diffusion Model
Tianxiong Zhong
Xingye Tian
X. Wang
Boyuan Jiang
Xin Tao
Pengfei Wan
DiffM
274
0
0
20 Nov 2025
LiSTAR: Ray-Centric World Models for 4D LiDAR Sequences in Autonomous Driving
Pei Liu
Songtao Wang
Lang Zhang
Xingyue Peng
Yuandong Lyu
...
Weiliang Ma
Xueyang Zhang
Yifei Zhan
Xianpeng Lang
Jun Ma
SyDa
244
0
0
20 Nov 2025
AMS-KV: Adaptive KV Caching in Multi-Scale Visual Autoregressive Transformers
Boxun Xu
Yu Wang
Zihu Wang
Peng Li
VLM
209
0
0
20 Nov 2025
Masked Auto-Regressive Variational Acceleration: Fast Inference Makes Practical Reinforcement Learning
Masked Auto-Regressive Variational Acceleration: Fast Inference Makes Practical Reinforcement Learning
Yuxuan Gu
Weimin Bai
Yifei Wang
Weijian Luo
H. Sun
DiffMOffRL
215
0
0
19 Nov 2025
UniHOI: Unified Human-Object Interaction Understanding via Unified Token Space
UniHOI: Unified Human-Object Interaction Understanding via Unified Token Space
Panqi Yang
Haodong Jing
Nanning Zheng
Yongqiang Ma
158
0
0
19 Nov 2025
X-WIN: Building Chest Radiograph World Model via Predictive Sensing
X-WIN: Building Chest Radiograph World Model via Predictive Sensing
Zefan Yang
Ge Wang
James A. Hendler
Mannudeep K. Kalra
Pingkun Yan
MedIm
113
0
0
18 Nov 2025
GloTok: Global Perspective Tokenizer for Image Reconstruction and Generation
GloTok: Global Perspective Tokenizer for Image Reconstruction and Generation
Xuan Zhao
Zhongyu Zhang
Y. Huang
Yuxi Mi
Guodong Mu
Shouhong Ding
Jun Wang
R. Guo
Shuigeng Zhou
VLM
189
0
0
18 Nov 2025
Towards Deploying VLA without Fine-Tuning: Plug-and-Play Inference-Time VLA Policy Steering via Embodied Evolutionary Diffusion
Towards Deploying VLA without Fine-Tuning: Plug-and-Play Inference-Time VLA Policy Steering via Embodied Evolutionary Diffusion
Zhuo Li
Junjia Liu
Zhipeng Dong
Tao Teng
Quentin Rouxel
D. Caldwell
Fei Chen
68
0
0
18 Nov 2025
CoordAR: One-Reference 6D Pose Estimation of Novel Objects via Autoregressive Coordinate Map Generation
CoordAR: One-Reference 6D Pose Estimation of Novel Objects via Autoregressive Coordinate Map Generation
Dexin Zuo
Ang Li
Wei Wang
Wenxian Yu
Danping Zou
3DPC
120
0
0
17 Nov 2025
Infinite-Story: A Training-Free Consistent Text-to-Image Generation
Infinite-Story: A Training-Free Consistent Text-to-Image Generation
Jihun Park
Kyoungmin Lee
Jongmin Gim
Hyeonseo Jo
Minseok Oh
Wonhyeok Choi
K. Hwang
Jaeyeul Kim
Minwoo Choi
S. Im
79
0
1
17 Nov 2025
ActVAR: Activating Mixtures of Weights and Tokens for Efficient Visual Autoregressive Generation
ActVAR: Activating Mixtures of Weights and Tokens for Efficient Visual Autoregressive Generation
Kaixin Zhang
Ruiqing Yang
Yuan Zhang
Shan You
Tao Huang
VLM
79
0
0
17 Nov 2025
Seg-VAR: Image Segmentation with Visual Autoregressive Modeling
Seg-VAR: Image Segmentation with Visual Autoregressive Modeling
Rongkun Zheng
Lu Qi
Xi Chen
Yi Wang
K. Wang
Hengshuang Zhao
96
0
0
16 Nov 2025
MixAR: Mixture Autoregressive Image Generation
MixAR: Mixture Autoregressive Image Generation
Jinyuan Hu
Jiayou Zhang
Shaobo Cui
Kun Zhang
Guangyi Chen
DiffM
96
0
0
15 Nov 2025
Point Cloud Quantization through Multimodal Prompting for 3D Understanding
Point Cloud Quantization through Multimodal Prompting for 3D Understanding
Hongxuan Li
Wencheng Zhu
Huiying Xu
Xinzhong Zhu
Pengfei Zhu
MQ3DPC
321
0
0
15 Nov 2025
Improved Masked Image Generation with Knowledge-Augmented Token Representations
Improved Masked Image Generation with Knowledge-Augmented Token Representations
Guotao Liang
Baoquan Zhang
Zhiyuan Wen
Zihao Han
Yunming Ye
72
0
0
15 Nov 2025
FlowCast: Advancing Precipitation Nowcasting with Conditional Flow Matching
FlowCast: Advancing Precipitation Nowcasting with Conditional Flow Matching
Bernardo Perrone Ribeiro
Jana Faganeli Pucer
114
0
0
12 Nov 2025
Retrospective motion correction in MRI using disentangled embeddings
Retrospective motion correction in MRI using disentangled embeddings
Qi Wang
Veronika Ecker
Marcel Früh
S. Gatidis
Thomas Kustner
36
0
0
11 Nov 2025
MRT: Learning Compact Representations with Mixed RWKV-Transformer for Extreme Image Compression
MRT: Learning Compact Representations with Mixed RWKV-Transformer for Extreme Image Compression
Han Liu
Hengyu Man
Xingtao Wang
Wenrui Li
Debin Zhao
ViT
73
0
0
10 Nov 2025
VAEVQ: Enhancing Discrete Visual Tokenization through Variational Modeling
VAEVQ: Enhancing Discrete Visual Tokenization through Variational Modeling
Sicheng Yang
Xing Hu
Qiang Wu
Dawei Yang
149
0
0
10 Nov 2025
PADM: A Physics-aware Diffusion Model for Attenuation Correction
PADM: A Physics-aware Diffusion Model for Attenuation Correction
T. Pham
Hoang Minh Vu
Anh Duc Chu
D. Nguyen
Trung Thanh Nguyen
Thao Nguyen Truong
Mai Hong Son
T. Nguyen
Phi Le Nguyen
MedIm
90
0
0
10 Nov 2025
MALeR: Improving Compositional Fidelity in Layout-Guided Generation
MALeR: Improving Compositional Fidelity in Layout-Guided Generation
Shivank Saxena
D. Srivastava
Makarand Tapaswi
DiffM
74
0
0
08 Nov 2025
PhysCorr: Dual-Reward DPO for Physics-Constrained Text-to-Video Generation with Automated Preference Selection
PhysCorr: Dual-Reward DPO for Physics-Constrained Text-to-Video Generation with Automated Preference Selection
Peiyao Wang
Weining Wang
Qi Li
EGVMVGen
331
1
0
06 Nov 2025
CPO: Condition Preference Optimization for Controllable Image Generation
CPO: Condition Preference Optimization for Controllable Image Generation
Zonglin Lyu
Ming Li
Xinxin Liu
Chen Chen
160
0
0
06 Nov 2025
DiffSwap++: 3D Latent-Controlled Diffusion for Identity-Preserving Face Swapping
DiffSwap++: 3D Latent-Controlled Diffusion for Identity-Preserving Face Swapping
Weston Bondurant
Arkaprava Sinha
Hieu M. Le
Srijan Das
Stephanie Schuckers
DiffM
125
0
0
04 Nov 2025
Effective Test-Time Scaling of Discrete Diffusion through Iterative Refinement
Effective Test-Time Scaling of Discrete Diffusion through Iterative Refinement
Sanghyun Lee
Sunwoo Kim
Seungryong Kim
Jongho Park
D. Park
52
0
0
04 Nov 2025
NSYNC: Negative Synthetic Image Generation for Contrastive Training to Improve Stylized Text-To-Image Translation
NSYNC: Negative Synthetic Image Generation for Contrastive Training to Improve Stylized Text-To-Image Translation
Serkan Ozturk
Samet Hicsonmez
Pinar Duygulu
DiffM
245
0
0
03 Nov 2025
EVLP:Learning Unified Embodied Vision-Language Planner with Reinforced Supervised Fine-Tuning
EVLP:Learning Unified Embodied Vision-Language Planner with Reinforced Supervised Fine-Tuning
Xinyan Cai
Shiguang Wu
Dafeng Chi
Yuzheng Zhuang
Xingyue Quan
Jianye Hao
Qiang Guan
57
0
0
03 Nov 2025
MoSa: Motion Generation with Scalable Autoregressive Modeling
MoSa: Motion Generation with Scalable Autoregressive Modeling
Mengyuan Liu
Sheng Yan
Y. Wang
Yingjie Li
Gui-Bin Bian
Hong Liu
150
0
0
03 Nov 2025
Wave-Particle (Continuous-Discrete) Dualistic Visual Tokenization for Unified Understanding and Generation
Wave-Particle (Continuous-Discrete) Dualistic Visual Tokenization for Unified Understanding and Generation
Yizhu Chen
Chen Ju
Z. Wang
Shuai Xiao
X. Chen
Jinsong Lan
Xiaoyong Zhu
Ying Chen
83
0
0
03 Nov 2025
InertialAR: Autoregressive 3D Molecule Generation with Inertial Frames
InertialAR: Autoregressive 3D Molecule Generation with Inertial Frames
Haorui Li
Weitao Du
Yuqiang Li
Hongyu Guo
Shengchao Liu
60
1
0
31 Oct 2025
MoRE: 3D Visual Geometry Reconstruction Meets Mixture-of-Experts
MoRE: 3D Visual Geometry Reconstruction Meets Mixture-of-Experts
Jingnan Gao
Zhe Wang
X. Fang
X. Ren
Z. Chen
Shengqi Liu
Y. Cheng
Jiangjing Lyu
Xiaokang Yang
Y. Yan
176
0
0
31 Oct 2025
Continuous Autoregressive Language Models
Continuous Autoregressive Language Models
Chenze Shao
Darren Li
Fandong Meng
Jie Zhou
KELM
214
0
0
31 Oct 2025
Alpamayo-R1: Bridging Reasoning and Action Prediction for Generalizable Autonomous Driving in the Long Tail
Alpamayo-R1: Bridging Reasoning and Action Prediction for Generalizable Autonomous Driving in the Long Tail
Nvidia
Yan Wang
W. Luo
Junjie Bai
Yulong Cao
...
Yurong You
Xiaohui Zeng
Wenyuan Zhang
Boris Ivanovic
Marco Pavone
LRM
100
7
0
30 Oct 2025
Emu3.5: Native Multimodal Models are World Learners
Emu3.5: Native Multimodal Models are World Learners
Yufeng Cui
Honghao Chen
Haoge Deng
X. Y. Huang
Xinghang Li
...
Zhuo Chen
Yulong Ao
Tiejun Huang
Zhongyuan Wang
Xinlong Wang
MLLMVGen
376
8
0
30 Oct 2025
1234...464748
Next