ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2012.09841
  4. Cited By
Taming Transformers for High-Resolution Image Synthesis
v1v2v3 (latest)

Taming Transformers for High-Resolution Image Synthesis

Computer Vision and Pattern Recognition (CVPR), 2020
17 December 2020
Patrick Esser
Robin Rombach
Bjorn Ommer
    ViT
ArXiv (abs)PDFHTMLGithub (6185★)

Papers citing "Taming Transformers for High-Resolution Image Synthesis"

50 / 2,401 papers shown
Image Tokenizer Needs Post-Training
Image Tokenizer Needs Post-Training
Kai Qiu
Xiang Li
Hao Chen
Jason Kuen
Xiaohao Xu
Jiuxiang Gu
Yinyi Luo
Bhiksha Raj
Zhe Lin
Marios Savvides
VLM
188
4
0
15 Sep 2025
AvatarSync: Rethinking Talking-Head Animation through Phoneme-Guided Autoregressive Perspective
AvatarSync: Rethinking Talking-Head Animation through Phoneme-Guided Autoregressive Perspective
Yuchen Deng
Xiuyang Wu
Hai-Tao Zheng
Suiyang Zhang
Yi He
Yuxing Han
VGen
112
0
0
15 Sep 2025
Layout-Conditioned Autoregressive Text-to-Image Generation via Structured Masking
Layout-Conditioned Autoregressive Text-to-Image Generation via Structured Masking
Zirui Zheng
Takashi Isobe
Tong Shen
Xu Jia
Jianbin Zhao
...
Dong Li
Dong Zhou
Yunzhi Zhuge
Huchuan Lu
E. Barsoum
159
1
0
15 Sep 2025
InfGen: A Resolution-Agnostic Paradigm for Scalable Image Synthesis
InfGen: A Resolution-Agnostic Paradigm for Scalable Image Synthesis
Tao Han
Wanghan Xu
Junchao Gong
Xiaoyu Yue
Song Guo
Luping Zhou
Lei Bai
122
1
0
12 Sep 2025
Scalable Training for Vector-Quantized Networks with 100% Codebook Utilization
Scalable Training for Vector-Quantized Networks with 100% Codebook Utilization
Yifan Chang
Jie Qin
Limeng Qiao
Xiaofeng Wang
Zheng Zhu
Lin Ma
Xingang Wang
MQ
148
3
0
12 Sep 2025
Bitrate-Controlled Diffusion for Disentangling Motion and Content in Video
Bitrate-Controlled Diffusion for Disentangling Motion and Content in Video
Xiao Li
Qi Chen
Xiulian Peng
K. Yu
Xie Chen
Yan Lu
DiffMVGen
112
1
0
10 Sep 2025
Streaming Sequence-to-Sequence Learning with Delayed Streams Modeling
Streaming Sequence-to-Sequence Learning with Delayed Streams Modeling
Neil Zeghidour
Eugene Kharitonov
Manu Orsini
Václav Volhejn
Gabriel de Marmiesse
Edouard Grave
P. Pérez
Laurent Mazaré
Alexandre Défossez
OffRL
223
5
0
10 Sep 2025
Diffusion-Based Action Recognition Generalizes to Untrained Domains
Diffusion-Based Action Recognition Generalizes to Untrained Domains
Rogério Guimarães
Frank Xiao
Pietro Perona
Markus Marks
269
0
0
10 Sep 2025
World Modeling with Probabilistic Structure Integration
World Modeling with Probabilistic Structure Integration
Klemen Kotar
Wanhee Lee
Rahul Venkatesh
Honglin Chen
Daniel M. Bear
...
Imran Thobani
Alex Durango
Khaled Jedoui
Atlas Kazemian
Dan Yamins
132
3
0
10 Sep 2025
Reconstruction Alignment Improves Unified Multimodal Models
Reconstruction Alignment Improves Unified Multimodal Models
Ji Xie
Trevor Darrell
Luke Zettlemoyer
Xudong Wang
214
15
0
08 Sep 2025
PRIM: Towards Practical In-Image Multilingual Machine Translation
PRIM: Towards Practical In-Image Multilingual Machine Translation
Yanzhi Tian
Zeming Liu
Zhengyang Liu
Chong Feng
Xin Li
Heyan Huang
Yuhang Guo
VLM
120
0
0
05 Sep 2025
Missing Fine Details in Images: Last Seen in High Frequencies
Missing Fine Details in Images: Last Seen in High Frequencies
Tejaswini Medi
Hsien-Yi Wang
Arianna Rampini
Margret Keuper
294
2
0
05 Sep 2025
Discrete Noise Inversion for Next-scale Autoregressive Text-based Image Editing
Discrete Noise Inversion for Next-scale Autoregressive Text-based Image Editing
Quan Dao
Xiaoxiao He
Ligong Han
Ngan Hoai Nguyen
Amin Heyrani Nobar
Faez Ahmed
Han Zhang
Viet Anh Nguyen
Dimitris N. Metaxas
DiffM
207
0
0
02 Sep 2025
Analysis of Speaker Verification Performance Trade-offs with Neural Audio Codec Transmission
Analysis of Speaker Verification Performance Trade-offs with Neural Audio Codec Transmission
Nirmalya Mallick Thakur
J. Yip
Eng Siong Chng
88
0
0
02 Sep 2025
2D Gaussian Splatting with Semantic Alignment for Image Inpainting
2D Gaussian Splatting with Semantic Alignment for Image Inpainting
Hongyu Li
Chaofeng Chen
Xiaoming Li
Guangming Lu
3DGS
151
0
0
02 Sep 2025
GPSToken: Gaussian Parameterized Spatially-adaptive Tokenization for Image Representation and Generation
GPSToken: Gaussian Parameterized Spatially-adaptive Tokenization for Image Representation and Generation
Zhengqiang Zhang
Rongyuan Wu
Lingchen Sun
Lei Zhang
277
2
0
01 Sep 2025
Disentangling Latent Embeddings with Sparse Linear Concept Subspaces (SLiCS)
Disentangling Latent Embeddings with Sparse Linear Concept Subspaces (SLiCS)
Zhi Li
Hau Phan
Matthew Emigh
Austin J. Brockmeier
CoGe
160
0
0
27 Aug 2025
Controllable Skin Synthesis via Lesion-Focused Vector Autoregression Model
Controllable Skin Synthesis via Lesion-Focused Vector Autoregression ModelInternational Conference on Medical Image Computing and Computer-Assisted Intervention (MICCAI), 2025
Jiajun Sun
Zhen Yu
Siyuan Yan
Jason J. Ong
Zongyuan Ge
Lei Zhang
MedIm
84
0
0
27 Aug 2025
LSD-3D: Large-Scale 3D Driving Scene Generation with Geometry Grounding
LSD-3D: Large-Scale 3D Driving Scene Generation with Geometry Grounding
Julian Ost
Andrea Ramazzina
Amogh Joshi
Maximilian Bömer
Mario Bijelic
Felix Heide
3DV
160
2
0
26 Aug 2025
CEIDM: A Controlled Entity and Interaction Diffusion Model for Enhanced Text-to-Image Generation
CEIDM: A Controlled Entity and Interaction Diffusion Model for Enhanced Text-to-Image Generation
Mingyue Yang
Dianxi Shi
Jialu Zhou
Xinyu Wei
Leqian Li
Shaowu Yang
Chunping Qiu
121
0
0
25 Aug 2025
FlowVLA: Visual Chain of Thought-based Motion Reasoning for Vision-Language-Action Models
FlowVLA: Visual Chain of Thought-based Motion Reasoning for Vision-Language-Action Models
Zhide Zhong
Haodong Yan
Junfeng Li
Xiangchen Liu
Xin Gong
...
Wenxuan Song
Jiayi Chen
Xinhu Zheng
Hesheng Wang
Haoang Li
LRMVGen
204
3
0
25 Aug 2025
Waver: Wave Your Way to Lifelike Video Generation
Waver: Wave Your Way to Lifelike Video Generation
Yifu Zhang
Hao Yang
Yuqi Zhang
Yifei Hu
Fengda Zhu
Chuang Lin
Xiaofeng Mei
Yi Jiang
Zehuan Yuan
Zehuan Yuan
DiffMVGen
162
0
0
21 Aug 2025
Visual Autoregressive Modeling for Instruction-Guided Image Editing
Visual Autoregressive Modeling for Instruction-Guided Image Editing
Qingyang Mao
Qi Cai
Yehao Li
Yingwei Pan
Mingyue Cheng
Ting Yao
Qi Liu
Tao Mei
DiffM
162
5
0
21 Aug 2025
Survey of Vision-Language-Action Models for Embodied Manipulation
Survey of Vision-Language-Action Models for Embodied Manipulation
Haoran Li
Yuhui Chen
Wenbo Cui
Weiheng Liu
Kai Liu
Mingcai Zhou
Zhengtao Zhang
Dongbin Zhao
LM&Ro
466
4
0
21 Aug 2025
Taming Transformer for Emotion-Controllable Talking Face Generation
Taming Transformer for Emotion-Controllable Talking Face Generation
Ziqi Zhang
Cheng Deng
CVBM
138
0
0
20 Aug 2025
Linear Preference Optimization: Decoupled Gradient Control via Absolute Regularization
Linear Preference Optimization: Decoupled Gradient Control via Absolute Regularization
Rui Wang
Qianguo Sun
Chao Song
Junlong Wu
Tianrong Chen
Zhiyun Zeng
Yu Li
211
1
0
20 Aug 2025
Disentanglement in T-space for Faster and Distributed Training of Diffusion Models with Fewer Latent-states
Disentanglement in T-space for Faster and Distributed Training of Diffusion Models with Fewer Latent-states
Samarth Gupta
Raghudeep Gadde
Rui Chen
Aleix M. Martinez
130
0
0
20 Aug 2025
From Basic Affordances to Symbolic Thought: A Computational Phylogenesis of Biological Intelligence
From Basic Affordances to Symbolic Thought: A Computational Phylogenesis of Biological Intelligence
John E. Hummel
Rachel Heaton
87
0
0
20 Aug 2025
Temporal-Conditional Referring Video Object Segmentation with Noise-Free Text-to-Video Diffusion Model
Temporal-Conditional Referring Video Object Segmentation with Noise-Free Text-to-Video Diffusion Model
Ruixin Zhang
Jiaqing Fan
Yifan Liao
Qian Qiao
Fanzhang Li
DiffMVOS
254
0
0
19 Aug 2025
2D Gaussians Meet Visual Tokenizer
2D Gaussians Meet Visual Tokenizer
Yiang Shi
Xiaoyang Guo
Wei Yin
Mingkai Jia
Qian Zhang
Xiaolin Hu
Wenyu Liu
Xinggang Wang
3DGS
149
1
0
19 Aug 2025
InfiniteTalk: Audio-driven Video Generation for Sparse-Frame Video Dubbing
InfiniteTalk: Audio-driven Video Generation for Sparse-Frame Video Dubbing
Shaoshu Yang
Zhe Kong
Feng Gao
Meng Cheng
Xiangyu Liu
...
Zhuoliang Kang
Tong Lu
Xunliang Cai
Ran He
Xiaoming Wei
VGen
127
10
0
19 Aug 2025
Latent Interpolation Learning Using Diffusion Models for Cardiac Volume Reconstruction
Latent Interpolation Learning Using Diffusion Models for Cardiac Volume Reconstruction
Niklas Bubeck
Antonio Terpin
Chen Chen
Can Zhao
Pengfei Guo
...
Georg Zitzlsberger
Daguang Xu
Bernhard Kainz
Daniel Rueckert
J. Pan
DiffMMedIm
309
1
0
19 Aug 2025
Next Visual Granularity Generation
Next Visual Granularity Generation
Yikai Wang
Zhouxia Wang
Zhonghua Wu
Qingyi Tao
Kang Liao
Chen Change Loy
146
1
0
18 Aug 2025
Versatile Video Tokenization with Generative 2D Gaussian Splatting
Versatile Video Tokenization with Generative 2D Gaussian Splatting
Zhenghao Chen
Zicong Chen
Lei Liu
Yiming Wu
Dong Xu
3DGS
135
0
0
15 Aug 2025
Semi-supervised Image Dehazing via Expectation-Maximization and Bidirectional Brownian Bridge Diffusion Models
Semi-supervised Image Dehazing via Expectation-Maximization and Bidirectional Brownian Bridge Diffusion Models
Bing-Quan Liu
Le Wang
Mingming Liu
Hao Liu
Rui Yao
Yong Zhou
Peng Liu
Tongqiang Xia
DiffM
65
0
0
15 Aug 2025
Integrating Reinforcement Learning with Visual Generative Models: Foundations and Advances
Integrating Reinforcement Learning with Visual Generative Models: Foundations and Advances
Yuanzhi Liang
Yijie Fang
Rui Li
Ziqi Ni
Ruijie Su
Chi Zhang
EGVM
306
2
0
14 Aug 2025
Large Model Empowered Embodied AI: A Survey on Decision-Making and Embodied Learning
Large Model Empowered Embodied AI: A Survey on Decision-Making and Embodied Learning
Wenlong Liang
Rui Zhou
Yang Ma
Bing Zhang
Songlin Li
Yijia Liao
Ping Kuang
LM&Ro3DVAI4CE
168
8
0
14 Aug 2025
GenFlowRL: Shaping Rewards with Generative Object-Centric Flow in Visual Reinforcement Learning
GenFlowRL: Shaping Rewards with Generative Object-Centric Flow in Visual Reinforcement Learning
Kelin Yu
Sheng Zhang
Harshit Soora
Furong Huang
Heng Huang
Erfaun Noorani
Ruohan Gao
VGen
92
4
0
14 Aug 2025
DAS: Dual-Aligned Semantic IDs Empowered Industrial Recommender System
DAS: Dual-Aligned Semantic IDs Empowered Industrial Recommender SystemInternational Conference on Information and Knowledge Management (CIKM), 2025
Wencai Ye
Mingjie Sun
Shaoyun Shi
Peng Wang
Wenjin Wu
Peng Jiang
165
2
0
14 Aug 2025
Exploiting Discriminative Codebook Prior for Autoregressive Image Generation
Exploiting Discriminative Codebook Prior for Autoregressive Image Generation
Longxiang Tang
Ruihang Chu
Xiang Wang
Yujin Han
Pingyu Wu
Chunming He
Yingya Zhang
Shiwei Zhang
Jiaya Jia
140
3
0
14 Aug 2025
Ultra-High-Definition Reference-Based Landmark Image Super-Resolution with Generative Diffusion Prior
Ultra-High-Definition Reference-Based Landmark Image Super-Resolution with Generative Diffusion Prior
Zhenning Shi
Zizheng Yan
Yuhang Yu
Clara Xue
Jingyu Zhuang
Qi Zhang
Jinwei Chen
Tao Li
Qingnan Fan
127
0
0
14 Aug 2025
NextStep-1: Toward Autoregressive Image Generation with Continuous Tokens at Scale
NextStep-1: Toward Autoregressive Image Generation with Continuous Tokens at Scale
NextStep Team
Chunrui Han
Guopeng Li
J. Wu
Quan Sun
...
Ziyang Meng
Binxing Jiao
Daxin Jiang
X. Zhang
Yibo Zhu
DiffM
201
22
0
14 Aug 2025
MInDI-3D: Iterative Deep Learning in 3D for Sparse-view Cone Beam Computed Tomography
MInDI-3D: Iterative Deep Learning in 3D for Sparse-view Cone Beam Computed Tomography
Daniel Barco
Marc Stadelmann
Martin Oswald
Ivo Herzig
Lukas Lichtensteiger
Pascal Paysan
Igor Peterlik
Michal Walczak
Bjoern Menze
Frank-Peter Schilling
MedIm
192
0
0
13 Aug 2025
PaCo-FR: Patch-Pixel Aligned End-to-End Codebook Learning for Facial Representation Pre-training
PaCo-FR: Patch-Pixel Aligned End-to-End Codebook Learning for Facial Representation Pre-training
Yin Xie
Zhichao Chen
Xiaoze Yu
Yongle Zhao
Xiang An
Kaicheng Yang
Zimin Ran
Jia Guo
Ziyong Feng
Jiankang Deng
152
0
0
13 Aug 2025
Prototype-Guided Diffusion: Visual Conditioning without External Memory
Prototype-Guided Diffusion: Visual Conditioning without External Memory
Hanane Azzag
Hanane Azzag
M. Lebbah
DiffMVLM
282
0
0
13 Aug 2025
Images Speak Louder Than Scores: Failure Mode Escape for Enhancing Generative Quality
Images Speak Louder Than Scores: Failure Mode Escape for Enhancing Generative Quality
Jie Shao
Ke Zhu
Minghao Fu
Guo-Hua Wang
Jianxin Wu
104
0
0
13 Aug 2025
OneVAE: Joint Discrete and Continuous Optimization Helps Discrete Video VAE Train Better
OneVAE: Joint Discrete and Continuous Optimization Helps Discrete Video VAE Train Better
Yupeng Zhou
Zhen Li
Ziheng Ouyang
Yuming Chen
Ruoyi Du
...
Bin Fu
Yihao Liu
Peng Gao
Ming-Ming Cheng
Qibin Hou
200
1
0
13 Aug 2025
Stable Diffusion Models are Secretly Good at Visual In-Context Learning
Stable Diffusion Models are Secretly Good at Visual In-Context Learning
Trevine Oorloff
Vishwanath Sindagi
Wele Gedara Chaminda Bandara
Ali Shafahi
Amin Ghiasi
Charan Prakash
R. Ardekani
DiffMVLM
151
3
0
13 Aug 2025
RealisMotion: Decomposed Human Motion Control and Video Generation in the World Space
RealisMotion: Decomposed Human Motion Control and Video Generation in the World Space
Jingyun Liang
Jingkai Zhou
Shikai Li
Chenjie Cao
Lei Sun
Yichen Qian
Weihua Chen
Fan Wang
DiffMVGen
110
3
0
12 Aug 2025
Turbo-VAED: Fast and Stable Transfer of Video-VAEs to Mobile Devices
Turbo-VAED: Fast and Stable Transfer of Video-VAEs to Mobile Devices
Ya Zou
Jingfeng Yao
Siyuan Yu
Shuai Zhang
Wenyu Liu
Xinggang Wang
ViT
159
2
0
12 Aug 2025
Previous
123456...474849
Next