ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2012.09841
  4. Cited By
Taming Transformers for High-Resolution Image Synthesis

Taming Transformers for High-Resolution Image Synthesis

17 December 2020
Patrick Esser
Robin Rombach
Bjorn Ommer
    ViT
ArXivPDFHTML

Papers citing "Taming Transformers for High-Resolution Image Synthesis"

50 / 476 papers shown
Title
Continuous Visual Autoregressive Generation via Score Maximization
Continuous Visual Autoregressive Generation via Score Maximization
Chenze Shao
Fandong Meng
Jie Zhou
DiffM
21
0
0
12 May 2025
H$^{\mathbf{3}}$DP: Triply-Hierarchical Diffusion Policy for Visuomotor Learning
H3^{\mathbf{3}}3DP: Triply-Hierarchical Diffusion Policy for Visuomotor Learning
Yiyang Lu
Yufeng Tian
Zhecheng Yuan
X. Wang
Pu Hua
Zhengrong Xue
Huazhe Xu
21
0
0
12 May 2025
Noise-Consistent Siamese-Diffusion for Medical Image Synthesis and Segmentation
Noise-Consistent Siamese-Diffusion for Medical Image Synthesis and Segmentation
Kunpeng Qiu
Zhiqiang Gao
Zhiying Zhou
Mingjie Sun
Yongxin Guo
MedIm
29
0
0
09 May 2025
OWT: A Foundational Organ-Wise Tokenization Framework for Medical Imaging
OWT: A Foundational Organ-Wise Tokenization Framework for Medical Imaging
Sifan Song
Siyeop Yoon
Pengfei Jin
Sekeun Kim
Matthew Tivnan
...
Zhiliang Lyu
Dufan Wu
Ning Guo
Xiang Li
Quanzheng Li
OOD
ViT
57
0
0
08 May 2025
TokLIP: Marry Visual Tokens to CLIP for Multimodal Comprehension and Generation
TokLIP: Marry Visual Tokens to CLIP for Multimodal Comprehension and Generation
Haokun Lin
Teng Wang
Yixiao Ge
Yuying Ge
Zhichao Lu
Ying Wei
Qingfu Zhang
Zhenan Sun
Ying Shan
MLLM
VLM
64
0
0
08 May 2025
PrimitiveAnything: Human-Crafted 3D Primitive Assembly Generation with Auto-Regressive Transformer
PrimitiveAnything: Human-Crafted 3D Primitive Assembly Generation with Auto-Regressive Transformer
Jingwen Ye
Yuze He
Yanning Zhou
Yiqin Zhu
Kaiwen Xiao
Yong-Jin Liu
Wei Yang
Xiao Han
39
0
0
07 May 2025
Real-Time Person Image Synthesis Using a Flow Matching Model
Real-Time Person Image Synthesis Using a Flow Matching Model
Jiwoo Jeong
Kirok Kim
Wooju Kim
Nam-Joon Kim
3DH
58
0
0
06 May 2025
Unified Multimodal Understanding and Generation Models: Advances, Challenges, and Opportunities
Unified Multimodal Understanding and Generation Models: Advances, Challenges, and Opportunities
X. Zhang
Jintao Guo
Shanshan Zhao
Minghao Fu
Lunhao Duan
Guo-Hua Wang
Qing-Guo Chen
Zhao Xu
Weihua Luo
Kaifu Zhang
DiffM
62
0
0
05 May 2025
RobSurv: Vector Quantization-Based Multi-Modal Learning for Robust Cancer Survival Prediction
RobSurv: Vector Quantization-Based Multi-Modal Learning for Robust Cancer Survival Prediction
Aiman Farooq
Azad Singh
Deepak Mishra
S. Chaudhury
27
0
0
05 May 2025
Any-to-Any Vision-Language Model for Multimodal X-ray Imaging and Radiological Report Generation
Any-to-Any Vision-Language Model for Multimodal X-ray Imaging and Radiological Report Generation
Daniele Molino
Francesco Di Feola
Linlin Shen
Paolo Soda
V. Guarrasi
MedIm
LM&MA
62
0
0
02 May 2025
T2I-R1: Reinforcing Image Generation with Collaborative Semantic-level and Token-level CoT
T2I-R1: Reinforcing Image Generation with Collaborative Semantic-level and Token-level CoT
D. Jiang
Ziyu Guo
Renrui Zhang
Zhuofan Zong
Hao Li
Le Zhuo
Shilin Yan
Pheng-Ann Heng
H. Li
LRM
57
0
0
01 May 2025
AGHI-QA: A Subjective-Aligned Dataset and Metric for AI-Generated Human Images
AGHI-QA: A Subjective-Aligned Dataset and Metric for AI-Generated Human Images
Yunhao Li
Sijing Wu
Wei Sun
Zhichao Zhang
Yucheng Zhu
Zicheng Zhang
Huiyu Duan
Xiongkuo Min
Guangtao Zhai
EGVM
81
0
0
30 Apr 2025
Revisiting Diffusion Autoencoder Training for Image Reconstruction Quality
Revisiting Diffusion Autoencoder Training for Image Reconstruction Quality
Pramook Khungurn
Sukit Seripanitkarn
Phonphrm Thawatdamrongkit
Supasorn Suwajanakorn
DiffM
68
0
0
30 Apr 2025
Why Compress What You Can Generate? When GPT-4o Generation Ushers in Image Compression Fields
Why Compress What You Can Generate? When GPT-4o Generation Ushers in Image Compression Fields
Yixin Gao
Xiaohan Pan
X. Li
Zhibo Chen
51
0
0
30 Apr 2025
GarmentX: Autoregressive Parametric Representations for High-Fidelity 3D Garment Generation
GarmentX: Autoregressive Parametric Representations for High-Fidelity 3D Garment Generation
Jingfeng Guo
J. Chen
Weikai Chen
Zhenyu Sun
Lanjiong Li
Baozhu Zhao
Lingting Zhu
X. Wang
Qi Liu
3DH
80
0
0
29 Apr 2025
EarthMapper: Visual Autoregressive Models for Controllable Bidirectional Satellite-Map Translation
EarthMapper: Visual Autoregressive Models for Controllable Bidirectional Satellite-Map Translation
Zhe Dong
Yuzhe Sun
Tianzhu Liu
Wangmeng Zuo
Yanfeng Gu
51
0
0
28 Apr 2025
RadioFormer: A Multiple-Granularity Radio Map Estimation Transformer with 1\textpertenthousand Spatial Sampling
RadioFormer: A Multiple-Granularity Radio Map Estimation Transformer with 1\textpertenthousand Spatial Sampling
Zheng Fang
Kangjun Liu
Ke Chen
Qingyu Liu
J. Zhang
Lingyang Song
Yaowei Wang
36
0
0
27 Apr 2025
REED-VAE: RE-Encode Decode Training for Iterative Image Editing with Diffusion Models
REED-VAE: RE-Encode Decode Training for Iterative Image Editing with Diffusion Models
Gal Almog
Ariel Shamir
Ohad Fried
DiffM
53
0
0
26 Apr 2025
E-InMeMo: Enhanced Prompting for Visual In-Context Learning
E-InMeMo: Enhanced Prompting for Visual In-Context Learning
Jiahao Zhang
Bowen Wang
Hong Liu
Liangzhi Li
Yuta Nakashima
Hajime Nagahara
VLM
99
0
0
25 Apr 2025
Dual Prompting Image Restoration with Diffusion Transformers
Dual Prompting Image Restoration with Diffusion Transformers
Dehong Kong
Fan Li
Zhixin Wang
Jiaqi Xu
Renjing Pei
W. J. Li
Wenqi Ren
DiffM
62
0
0
24 Apr 2025
Fast Autoregressive Models for Continuous Latent Generation
Fast Autoregressive Models for Continuous Latent Generation
Tiankai Hang
Jianmin Bao
Fangyun Wei
Dong Chen
DiffM
70
0
0
24 Apr 2025
Token-Shuffle: Towards High-Resolution Image Generation with Autoregressive Models
Token-Shuffle: Towards High-Resolution Image Generation with Autoregressive Models
Xu Ma
Peize Sun
Haoyu Ma
Hao Tang
Chih-Yao Ma
...
Matt Feiszli
Peizhao Zhang
Peter Vajda
Sam S. Tsai
Y. Fu
68
1
0
24 Apr 2025
Turbo2K: Towards Ultra-Efficient and High-Quality 2K Video Synthesis
Turbo2K: Towards Ultra-Efficient and High-Quality 2K Video Synthesis
Jingjing Ren
Wenbo Li
Zhongdao Wang
Haoze Sun
Bangzhen Liu
...
Aoxue Li
Shifeng Zhang
Bin Shao
Yong Guo
Lei Zhu
VGen
34
0
0
20 Apr 2025
Generative Multimodal Pretraining with Discrete Diffusion Timestep Tokens
Generative Multimodal Pretraining with Discrete Diffusion Timestep Tokens
Kaihang Pan
Wang Lin
Zhongqi Yue
Tenglong Ao
Liyu Jia
Wei Zhao
Juncheng Billy Li
Siliang Tang
Hanwang Zhang
42
1
0
20 Apr 2025
Towards Explainable Fake Image Detection with Multi-Modal Large Language Models
Towards Explainable Fake Image Detection with Multi-Modal Large Language Models
Yikun Ji
Y. Hong
Jiahui Zhan
H. Chen
Jun Lan
Huijia Zhu
Weiqiang Wang
L. Zhang
Jianfu Zhang
MLLM
LRM
46
0
0
19 Apr 2025
Hierarchical Vector Quantized Graph Autoencoder with Annealing-Based Code Selection
Hierarchical Vector Quantized Graph Autoencoder with Annealing-Based Code Selection
Long Zeng
Jianxiang Yu
Jiapeng Zhu
Qingsong Zhong
Xiang Li
27
0
0
17 Apr 2025
SkyReels-V2: Infinite-length Film Generative Model
SkyReels-V2: Infinite-length Film Generative Model
Guibin Chen
D. Lin
Jiangping Yang
Chunze Lin
J. Zhu
...
Di Qiu
Debang Li
Zhengcong Fei
Yang Li
Yahui Zhou
DiffM
VGen
47
1
0
17 Apr 2025
Aligning Generative Denoising with Discriminative Objectives Unleashes Diffusion for Visual Perception
Aligning Generative Denoising with Discriminative Objectives Unleashes Diffusion for Visual Perception
Ziqi Pang
Xin Xu
Yu-Xiong Wang
DiffM
60
0
0
15 Apr 2025
Deep Generative Model-Based Generation of Synthetic Individual-Specific Brain MRI Segmentations
Deep Generative Model-Based Generation of Synthetic Individual-Specific Brain MRI Segmentations
Ruijie Wang
Luca Rossetto
Susan Mérillat
Christina Röcke
Mike Martin
Abraham Bernstein
DiffM
MedIm
54
0
0
15 Apr 2025
Learned Image Compression with Dictionary-based Entropy Model
Learned Image Compression with Dictionary-based Entropy Model
Jingbo Lu
Leheng Zhang
Xingyu Zhou
Mu-Wei Li
Wen Li
Shuhang Gu
37
0
0
01 Apr 2025
FastVAR: Linear Visual Autoregressive Modeling via Cached Token Pruning
FastVAR: Linear Visual Autoregressive Modeling via Cached Token Pruning
Hang Guo
Yawei Li
Taolin Zhang
J. Wang
Tao Dai
Shu-Tao Xia
Luca Benini
67
1
0
30 Mar 2025
Exploiting Mixture-of-Experts Redundancy Unlocks Multimodal Generative Abilities
Exploiting Mixture-of-Experts Redundancy Unlocks Multimodal Generative Abilities
Raman Dutt
Harleen Hanspal
Guoxuan Xia
Petru-Daniel Tudosiu
Alexander Black
Yongxin Yang
Steven G. McDonagh
Sarah Parisot
MoE
38
0
0
28 Mar 2025
Harmonizing Visual Representations for Unified Multimodal Understanding and Generation
Harmonizing Visual Representations for Unified Multimodal Understanding and Generation
Size Wu
W. Zhang
Lumin Xu
Sheng Jin
Zhonghua Wu
Qingyi Tao
Wentao Liu
Wei Li
Chen Change Loy
VGen
94
2
0
27 Mar 2025
MAR-3D: Progressive Masked Auto-regressor for High-Resolution 3D Generation
MAR-3D: Progressive Masked Auto-regressor for High-Resolution 3D Generation
Jinnan Chen
Lingting Zhu
Zeyu Hu
Shengju Qian
Y. Chen
Xin Wang
G. Lee
97
1
0
26 Mar 2025
Long-Context Autoregressive Video Modeling with Next-Frame Prediction
Long-Context Autoregressive Video Modeling with Next-Frame Prediction
Yuchao Gu
Weijia Mao
Mike Zheng Shou
VGen
73
2
0
25 Mar 2025
Diffusion-4K: Ultra-High-Resolution Image Synthesis with Latent Diffusion Models
Diffusion-4K: Ultra-High-Resolution Image Synthesis with Latent Diffusion Models
Jinjin Zhang
Qiuyu Huang
Junjie Liu
Xiefan Guo
Di Huang
54
1
0
24 Mar 2025
PromptMobile: Efficient Promptus for Low Bandwidth Mobile Video Streaming
PromptMobile: Efficient Promptus for Low Bandwidth Mobile Video Streaming
Liming Liu
Jiangkai Wu
Haoyang Wang
Peiheng Wang
Xinggong Zhang
Zongming Guo
42
0
0
20 Mar 2025
Unleashing Vecset Diffusion Model for Fast Shape Generation
Unleashing Vecset Diffusion Model for Fast Shape Generation
Zeqiang Lai
Yunfei Zhao
Zibo Zhao
Haolin Liu
Fuyun Wang
...
Jinwei Huang
Yuhong Liu
Jie Jiang
Chunchao Guo
Xiangyu Yue
DiffM
102
0
0
20 Mar 2025
The Change You Want To Detect: Semantic Change Detection In Earth Observation With Hybrid Data Generation
The Change You Want To Detect: Semantic Change Detection In Earth Observation With Hybrid Data Generation
Benidir Yanis
Gonthier Nicolas
Mallet Clement
47
1
0
19 Mar 2025
Generating Multimodal Driving Scenes via Next-Scene Prediction
Generating Multimodal Driving Scenes via Next-Scene Prediction
Yanhao Wu
Haoyang Zhang
Tianwei Lin
Lichao Huang
Shujie Luo
Rui Wu
Congpei Qiu
Wei Ke
Tong Zhang
VGen
46
0
0
19 Mar 2025
DualToken: Towards Unifying Visual Understanding and Generation with Dual Visual Vocabularies
DualToken: Towards Unifying Visual Understanding and Generation with Dual Visual Vocabularies
Wei Song
Y. Wang
Zijia Song
Yadong Li
Haoze Sun
Weipeng Chen
Zenan Zhou
Jianhua Xu
Jiaqi Wang
Kaicheng Yu
60
2
0
18 Mar 2025
BREEN: Bridge Data-Efficient Encoder-Free Multimodal Learning with Learnable Queries
BREEN: Bridge Data-Efficient Encoder-Free Multimodal Learning with Learnable Queries
Tianle Li
Yongming Rao
Winston Hu
Yu Cheng
MLLM
61
0
0
16 Mar 2025
Direction-Aware Diagonal Autoregressive Image Generation
Direction-Aware Diagonal Autoregressive Image Generation
Yijia Xu
Jianzhong Ju
Jian Luan
J. Cui
47
0
0
14 Mar 2025
TikZero: Zero-Shot Text-Guided Graphics Program Synthesis
TikZero: Zero-Shot Text-Guided Graphics Program Synthesis
Jonas Belouadi
Eddy Ilg
M. Keuper
Hideki Tanaka
Masao Utiyama
Raj Dabre
Steffen Eger
Simone Paolo Ponzetto
50
0
0
14 Mar 2025
Toward Generalized Image Quality Assessment: Relaxing the Perfect Reference Quality Assumption
Toward Generalized Image Quality Assessment: Relaxing the Perfect Reference Quality Assumption
Du Chen
Tianhe Wu
Kede Ma
Lei Zhang
33
2
0
14 Mar 2025
Flow to the Mode: Mode-Seeking Diffusion Autoencoders for State-of-the-Art Image Tokenization
Flow to the Mode: Mode-Seeking Diffusion Autoencoders for State-of-the-Art Image Tokenization
Kyle Sargent
Kyle Hsu
Justin Johnson
L. Fei-Fei
Jiajun Wu
DiffM
MU
53
2
0
14 Mar 2025
TAR3D: Creating High-Quality 3D Assets via Next-Part Prediction
TAR3D: Creating High-Quality 3D Assets via Next-Part Prediction
Xuying Zhang
Yutong Liu
Yangguang Li
Renrui Zhang
Y. Liu
...
Wanli Ouyang
Zhiwei Xiong
Peng Gao
Qibin Hou
Ming-Ming Cheng
118
3
0
13 Mar 2025
PromptMap: An Alternative Interaction Style for AI-Based Image Generation
PromptMap: An Alternative Interaction Style for AI-Based Image Generation
Krzysztof Adamkiewicz
Paweł W. Woźniak
Julia Dominiak
Andrzej Romanowski
Jakob Karolus
Stanislav Frolov
59
1
0
12 Mar 2025
Unleashing the Potential of Large Language Models for Text-to-Image Generation through Autoregressive Representation Alignment
Unleashing the Potential of Large Language Models for Text-to-Image Generation through Autoregressive Representation Alignment
Xing Xie
Jiawei Liu
Ziyue Lin
Huijie Fan
Zhi-Long Han
Yandong Tang
Liangqiong Qu
40
0
0
10 Mar 2025
Effective and Efficient Masked Image Generation Models
Effective and Efficient Masked Image Generation Models
Zebin You
Jingyang Ou
Xiaolu Zhang
Jun Hu
Jun Zhou
Chongxuan Li
DiffM
VLM
54
1
0
10 Mar 2025
1234...8910
Next