Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2012.09841
Cited By
Taming Transformers for High-Resolution Image Synthesis
17 December 2020
Patrick Esser
Robin Rombach
Bjorn Ommer
ViT
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Taming Transformers for High-Resolution Image Synthesis"
50 / 476 papers shown
Title
USP: Unified Self-Supervised Pretraining for Image Generation and Understanding
Xiangxiang Chu
Renda Li
Yong Wang
60
0
0
08 Mar 2025
Direct Discriminative Optimization: Your Likelihood-Based Visual Generative Model is Secretly a GAN Discriminator
Kaiwen Zheng
Yongxin Chen
Huayu Chen
Guande He
Ming-Yu Liu
J. Zhu
Qinsheng Zhang
DiffM
47
0
0
03 Mar 2025
DLF: Extreme Image Compression with Dual-generative Latent Fusion
Naifu Xue
Zhaoyang Jia
Jiahao Li
Bin Li
Yuan Zhang
Yan-Heng Lu
48
1
0
03 Mar 2025
WeGen: A Unified Model for Interactive Multimodal Generation as We Chat
Zhipeng Huang
Shaobin Zhuang
Canmiao Fu
Binxin Yang
Ying Zhang
Chong Sun
Zhizheng Zhang
Yali Wang
Chen Li
Zheng-Jun Zha
DiffM
69
1
0
03 Mar 2025
Raccoon: Multi-stage Diffusion Training with Coarse-to-Fine Curating Videos
Zhiyu Tan
Junyan Wang
Hao Yang
Luozheng Qin
Hesen Chen
Qiang-feng Zhou
Hao Li
VGen
64
0
0
28 Feb 2025
Beyond Next-Token: Next-X Prediction for Autoregressive Visual Generation
Sucheng Ren
Qihang Yu
Ju He
Xiaohui Shen
Alan Yuille
Liang-Chieh Chen
VGen
76
6
0
27 Feb 2025
Task-Driven Semantic Quantization and Imitation Learning for Goal-Oriented Communications
Yu-Chieh Chao
Yubei Chen
Weiwei Wang
Achintha Wijesinghe
Suchinthaka Wanninayaka
Songyang Zhang
Zhi Ding
DiffM
71
0
0
25 Feb 2025
LayerPano3D: Layered 3D Panorama for Hyper-Immersive Scene Generation
Shuai Yang
Jing Tan
Mengchen Zhang
Tong Wu
Y. Li
Gordon Wetzstein
Ziwei Liu
D. Lin
MDE
VGen
51
6
0
24 Feb 2025
VaViM and VaVAM: Autonomous Driving through Video Generative Modeling
Florent Bartoccioni
Elias Ramzi
Victor Besnier
Shashanka Venkataramanan
Tuan-Hung Vu
...
Mickael Chen
Éloi Zablocki
Andrei Bursuc
Eduardo Valle
Matthieu Cord
VGen
78
1
0
24 Feb 2025
Controllable Unlearning for Image-to-Image Generative Models via
ε
\varepsilon
ε
-Constrained Optimization
Xiaohua Feng
Chao-Jun Chen
Yuyuan Li
L. Zhang
Longfei Li
Jun Zhou
Xiaolin Zheng
MU
68
0
0
20 Feb 2025
EQ-VAE: Equivariance Regularized Latent Space for Improved Generative Image Modeling
Theodoros Kouzelis
Ioannis Kakogeorgiou
Spyros Gidaris
N. Komodakis
DRL
70
5
0
17 Feb 2025
Phantom: Subject-consistent video generation via cross-modal alignment
Lijie Liu
Tianxiang Ma
Bingchuan Li
Zhuowei Chen
Jiawei Liu
Qian He
Xinglong Wu
Qian He
Xinglong Wu
DiffM
VGen
50
5
0
16 Feb 2025
Towards Virtual Clinical Trials of Radiology AI with Conditional Generative Modeling
Benjamin Killeen
Bohua Wan
Aditya V. Kulkarni
Nathan G. Drenkow
Michael Oberst
Paul H. Yi
Mathias Unberath
MedIm
59
0
0
13 Feb 2025
Enhancing Video Understanding: Deep Neural Networks for Spatiotemporal Analysis
Amir Hosein Fadaei
M. Dehaqani
40
0
0
11 Feb 2025
Explaining 3D Computed Tomography Classifiers with Counterfactuals
Joseph Paul Cohen
Louis Blankemeier
Akshay S. Chaudhari
MedIm
105
0
0
11 Feb 2025
ConceptVAE: Self-Supervised Fine-Grained Concept Disentanglement from 2D Echocardiographies
C. Ciușdel
Alex Serban
Tiziano Passerini
CoGe
67
1
0
03 Feb 2025
UVGS: Reimagining Unstructured 3D Gaussian Splatting using UV Mapping
Aashish Rai
Dilin Wang
Mihir Jain
N. Sarafianos
Arthur Chen
Srinath Sridhar
Aayush Prakash
3DGS
72
1
0
03 Feb 2025
Visual Generation Without Guidance
Huayu Chen
Kai Jiang
Kaiwen Zheng
Jianfei Chen
Hang Su
J. Zhu
55
0
0
28 Jan 2025
Image Motion Blur Removal in the Temporal Dimension with Video Diffusion Models
Wang Pang
Zhihao Zhan
Xiang Zhu
Yechao Bai
DiffM
71
1
0
22 Jan 2025
VARGPT: Unified Understanding and Generation in a Visual Autoregressive Multimodal Large Language Model
Xianwei Zhuang
Yuxin Xie
Yufan Deng
Liming Liang
Jinghan Ru
Yuguo Yin
Yuexian Zou
MLLM
VLM
LRM
109
4
0
21 Jan 2025
PeFoMed: Parameter Efficient Fine-tuning of Multimodal Large Language Models for Medical Imaging
Gang Liu
Jinlong He
Pengfei Li
Genrong He
Zixu Zhao
Shenjun Zhong
LM&MA
74
2
0
17 Jan 2025
Synthetic Prior for Few-Shot Drivable Head Avatar Inversion
Wojciech Zielonka
Stephan Garbin
Alexandros Lattas
George Kopanas
Paulo F. U. Gotardo
Thabo Beeler
Justus Thies
Timo Bolkart
69
2
0
12 Jan 2025
MedCoDi-M: A Multi-Prompt Foundation Model for Multimodal Medical Data Generation
Daniele Molino
Francesco Di Feola
E. Faiella
Deborah Fazzini
D. Santucci
Linlin Shen
V. Guarrasi
Paolo Soda
SyDa
MedIm
39
0
0
10 Jan 2025
MObI: Multimodal Object Inpainting Using Diffusion Models
Alexandru Buburuzan
Anuj Sharma
John Redford
P. Dokania
Romain Mueller
DiffM
85
1
0
06 Jan 2025
Passive Non-Line-of-Sight Imaging with Light Transport Modulation
Jiarui Zhang
Ruixu Geng
Xiaolong Du
Yan Chen
Houqiang Li
Yang Hu
58
1
0
03 Jan 2025
TOTEM: TOkenized Time Series EMbeddings for General Time Series Analysis
Sabera Talukder
Yisong Yue
Georgia Gkioxari
AI4TS
40
12
0
03 Jan 2025
VidTwin: Video VAE with Decoupled Structure and Dynamics
Yuchi Wang
Junliang Guo
Xinyi Xie
Tianyu He
Xu Sun
Jiang Bian
DRL
VGen
73
3
0
23 Dec 2024
Self-Corrected Flow Distillation for Consistent One-Step and Few-Step Text-to-Image Generation
Quan Dao
Hao Phung
T. Dao
Dimitris Metaxas
Anh Tran
90
1
0
22 Dec 2024
Next Patch Prediction for Autoregressive Visual Generation
Yatian Pang
Peng Jin
Shuo Yang
Bin Lin
Bin Zhu
...
Liuhan Chen
Francis E. H. Tay
Ser-Nam Lim
Harry Yang
Li Yuan
120
8
0
19 Dec 2024
Parallelized Autoregressive Visual Generation
Y. Wang
Shuhuai Ren
Zhijie Lin
Yujin Han
Haoyuan Guo
Zhenheng Yang
Difan Zou
Jiashi Feng
Xihui Liu
VGen
84
11
0
19 Dec 2024
MMO-IG: Multi-Class and Multi-Scale Object Image Generation for Remote Sensing
Chuang Yang
Bingxuan Zhao
Qing Zhou
Qi Wang
79
1
0
18 Dec 2024
SoftVQ-VAE: Efficient 1-Dimensional Continuous Tokenizer
H. Chen
Z. Wang
X. Li
X. Sun
Fangyi Chen
Jiang Liu
J. Wang
Bhiksha Raj
Zicheng Liu
Emad Barsoum
VLM
108
6
0
14 Dec 2024
Video Diffusion Transformers are In-Context Learners
Zhengcong Fei
Di Qiu
Changqian Yu
Debang Li
Mingyuan Fan
VGen
DiffM
145
2
0
14 Dec 2024
DualPM: Dual Posed-Canonical Point Maps for 3D Shape and Pose Reconstruction
Ben Kaye
Tomas Jakab
Shangzhe Wu
Christian Rupprecht
Andrea Vedaldi
3DPC
3DH
97
1
0
05 Dec 2024
Switti: Designing Scale-Wise Transformers for Text-to-Image Synthesis
Anton Voronov
Denis Kuznedelev
Mikhail Khoroshikh
Valentin Khrulkov
Dmitry Baranchuk
106
2
0
02 Dec 2024
SerialGen: Personalized Image Generation by First Standardization Then Personalization
Cong Xie
Han Zou
Ruiqi Yu
Yan Zhang
Zhenpeng Zhan
64
1
0
02 Dec 2024
Orthus: Autoregressive Interleaved Image-Text Generation with Modality-Specific Heads
Siqi Kou
Jiachun Jin
Chang Liu
Ye Ma
Jian Jia
Quan Chen
Peng Jiang
Zhijie Deng
Zhijie Deng
DiffM
VGen
VLM
118
5
0
28 Nov 2024
WF-VAE: Enhancing Video VAE by Wavelet-Driven Energy Flow for Latent Video Diffusion Model
Zongjian Li
Bin Lin
Yang Ye
Liuhan Chen
Xinhua Cheng
Shenghai Yuan
Li-xin Yuan
VGen
DiffM
107
16
0
26 Nov 2024
SplatFlow: Multi-View Rectified Flow Model for 3D Gaussian Splatting Synthesis
Hyojun Go
Byeongjun Park
Jiho Jang
Jin-Young Kim
Soonwoo Kwon
Changick Kim
3DGS
111
2
0
25 Nov 2024
SAR3D: Autoregressive 3D Object Generation and Understanding via Multi-scale 3D VQVAE
Yongwei Chen
Yushi Lan
Shangchen Zhou
Tengfei Wang
Xingang Pan
100
5
0
25 Nov 2024
DiMoDif: Discourse Modality-information Differentiation for Audio-visual Deepfake Detection and Localization
C. Koutlis
Symeon Papadopoulos
58
2
0
15 Nov 2024
LDPM: Towards undersampled MRI reconstruction with MR-VAE and Latent Diffusion Prior
Xingjian Tang
Jingwei Guan
Linge Li
Youmei Zhang
Mengye Lyu
Li Yan
Li Yan
MedIm
DiffM
23
0
0
05 Nov 2024
CleAR: Robust Context-Guided Generative Lighting Estimation for Mobile Augmented Reality
Yiqin Zhao
Mallesham Dasari
Tian Guo
48
0
0
04 Nov 2024
Human-inspired Perspectives: A Survey on AI Long-term Memory
Zihong He
Weizhe Lin
Hao Zheng
Fan Zhang
Matt Jones
Laurence Aitchison
X. Xu
Miao Liu
Per Ola Kristensson
Junxiao Shen
77
2
0
01 Nov 2024
Diffusion Beats Autoregressive: An Evaluation of Compositional Generation in Text-to-Image Models
Arash Marioriyad
Parham Rezaei
M. Baghshah
M. Rohban
CoGe
94
0
0
30 Oct 2024
Bringing NeRFs to the Latent Space: Inverse Graphics Autoencoder
Antoine Schnepf
Karim Kassab
Jean-Yves Franceschi
Laurent Caraffa
Flavian Vasile
Jeremie Mary
Andrew Comport
Valérie Gouet-Brunet
35
2
0
30 Oct 2024
GiVE: Guiding Visual Encoder to Perceive Overlooked Information
Junjie Li
Jianghong Ma
Xiaofeng Zhang
Yuhang Li
Jianyang Shi
23
0
0
26 Oct 2024
Prioritized Generative Replay
Renhao Wang
Kevin Frans
Pieter Abbeel
Sergey Levine
Alexei A. Efros
OnRL
DiffM
91
2
0
23 Oct 2024
Bridging the Training-Inference Gap in LLMs by Leveraging Self-Generated Tokens
Zhepeng Cen
Yao Liu
Siliang Zeng
Pratik Chaudhar
Huzefa Rangwala
George Karypis
Rasool Fakoor
SyDa
AIFin
23
3
0
18 Oct 2024
Stabilize the Latent Space for Image Autoregressive Modeling: A Unified Perspective
Yongxin Zhu
B. Li
Hang Zhang
Xin Li
Linli Xu
Lidong Bing
DiffM
21
9
0
16 Oct 2024
Previous
1
2
3
4
5
...
8
9
10
Next