Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
2212.09748
Cited By
v1
v2 (latest)
Scalable Diffusion Models with Transformers
IEEE International Conference on Computer Vision (ICCV), 2022
19 December 2022
William S. Peebles
Saining Xie
GNN
Re-assign community
ArXiv (abs)
PDF
HTML
HuggingFace (18 upvotes)
Papers citing
"Scalable Diffusion Models with Transformers"
50 / 2,680 papers shown
Title
WithAnyone: Towards Controllable and ID Consistent Image Generation
H. Xu
Wei Cheng
Peng Xing
Yixiao Fang
Shuhan Wu
...
Xianfang Zeng
Daxin Jiang
Gang Yu
Xingjun Ma
Yu-Gang Jiang
DiffM
179
5
0
16 Oct 2025
DeLeaker: Dynamic Inference-Time Reweighting For Semantic Leakage Mitigation in Text-to-Image Models
Mor Ventura
Michael Toker
Or Patashnik
Yonatan Belinkov
Roi Reichart
116
0
0
16 Oct 2025
Adapting Self-Supervised Representations as a Latent Space for Efficient Generation
Ming Gui
Johannes Schusterbauer
Timy Phan
Felix Krause
J. Susskind
Miguel Angel Bautista
Bjorn Ommer
181
1
0
16 Oct 2025
Ponimator: Unfolding Interactive Pose for Versatile Human-human Interaction Animation
Shaowei Liu
Chuan Guo
Bing Zhou
Jian Wang
DiffM
167
1
0
16 Oct 2025
Inpainting the Red Planet: Diffusion Models for the Reconstruction of Martian Environments in Virtual Reality
Giuseppe Lorenzo Catalano
Agata Marta Soccini
108
0
0
16 Oct 2025
From Language to Locomotion: Retargeting-free Humanoid Control via Motion Latent Guidance
Z. Li
Cheng Chi
Yangyang Wei
Boan Zhu
Yibo Peng
Tao Huang
Pengwei Wang
Zhongyuan Wang
Shanghang Zhang
Chang Xu
213
1
0
16 Oct 2025
Reinforcement Learning Meets Masked Generative Models: Mask-GRPO for Text-to-Image Generation
Yifu Luo
Xinhao Hu
Keyu Fan
Haoyuan Sun
Zeyu Chen
Bo Xia
Tiantian Zhang
Yongzhe Chang
Xueqian Wang
98
0
0
15 Oct 2025
End-to-End Multi-Modal Diffusion Mamba
Chunhao Lu
Qiang Lu
Meichen Dong
Jake Luo
110
3
0
15 Oct 2025
FlashWorld: High-quality 3D Scene Generation within Seconds
Xinyang Li
Tengfei Wang
Zixiao Gu
Shengchuan Zhang
Chunchao Guo
Liujuan Cao
3DGS
108
3
0
15 Oct 2025
UniCalli: A Unified Diffusion Framework for Column-Level Generation and Recognition of Chinese Calligraphy
Tianshuo Xu
Kai Wang
Zhifei Chen
Leyi Wu
Tianshui Wen
Fei Chao
Ying-Cong Chen
DiffM
56
0
0
15 Oct 2025
PhysMaster: Mastering Physical Representation for Video Generation via Reinforcement Learning
S. Ji
Xi Chen
Xin Tao
Pengfei Wan
Hengshuang Zhao
VGen
PINN
186
3
0
15 Oct 2025
Edit-Your-Interest: Efficient Video Editing via Feature Most-Similar Propagation
Yi Zuo
Zitao Wang
Lingling Li
Xu Liu
Fang Liu
Licheng Jiao
DiffM
VGen
92
0
0
15 Oct 2025
EReLiFM: Evidential Reliability-Aware Residual Flow Meta-Learning for Open-Set Domain Generalization under Noisy Labels
Kunyu Peng
Di Wen
Kailun Yang
Jia Fu
Yufan Chen
...
Junwei Zheng
M. Sarfraz
Luc Van Gool
Danda Pani Paudel
Rainer Stiefelhagen
160
0
0
14 Oct 2025
Efficient Inference for Coupled Hidden Markov Models in Continuous Time and Discrete Space
Giosue Migliorini
Padhraic Smyth
80
0
0
14 Oct 2025
Mitigating the Noise Shift for Denoising Generative Models via Noise Awareness Guidance
Jincheng Zhong
Boyuan Jiang
Xin Tao
Pengfei Wan
Kun Gai
Mingsheng Long
DiffM
84
0
0
14 Oct 2025
LayerSync: Self-aligning Intermediate Layers
Yasaman Haghighi
B. V. Delft
Mariam Hassan
Alexandre Alahi
99
0
0
14 Oct 2025
FlashVSR: Towards Real-Time Diffusion-Based Streaming Video Super-Resolution
Junhao Zhuang
Shi Guo
Xin Cai
Xiaohui Li
Yihao Liu
Chun Yuan
Tianfan Xue
VGen
109
4
0
14 Oct 2025
Spatial Forcing: Implicit Spatial Representation Alignment for Vision-language-action Model
Fuhao Li
Wenxuan Song
Han Zhao
Jingbo Wang
Pengxiang Ding
Donglin Wang
Long Zeng
Haoang Li
162
3
0
14 Oct 2025
BIGFix: Bidirectional Image Generation with Token Fixing
Victor Besnier
David Hurych
Andrei Bursuc
Eduardo Valle
VGen
92
0
0
14 Oct 2025
CoRA: Covariate-Aware Adaptation of Time Series Foundation Models
Guo Qin
Z. Chen
Yong Liu
Z. Shi
Haixuan Liu
Xiangdong Huang
Jianmin Wang
Mingsheng Long
AI4TS
AI4CE
88
0
0
14 Oct 2025
Your VAR Model is Secretly an Efficient and Explainable Generative Classifier
Yi-Chung Chen
David I. Inouye
Jing Gao
DiffM
VLM
120
0
0
14 Oct 2025
Audio Palette: A Diffusion Transformer with Multi-Signal Conditioning for Controllable Foley Synthesis
Junnuo Wang
DiffM
91
0
0
14 Oct 2025
SceneAdapt: Scene-aware Adaptation of Human Motion Diffusion
Jungbin Cho
Minsu Kim
Jisoo Kim
Ce Zheng
László A. Jeni
Ming-Hsuan Yang
Youngjae Yu
Seonjoo Kim
DiffM
VGen
TTA
196
0
0
14 Oct 2025
A Connection Between Score Matching and Local Intrinsic Dimension
Eric Yeats
Aaron Jacobson
Darryl Hannan
Yiran Jia
T. Doster
Henry Kvinge
Scott Mahan
DiffM
136
1
0
14 Oct 2025
There is No VAE: End-to-End Pixel-Space Generative Modeling via Self-Supervised Pre-training
Jiachen Lei
Keli Liu
Julius Berner
Haiming Yu
Hongkai Zheng
Jiahong Wu
Xiangxiang Chu
DiffM
209
2
0
14 Oct 2025
PAINT: Parallel-in-time Neural Twins for Dynamical System Reconstruction
Andreas Radler
Vincent Seyfried
Stefan Pirker
Johannes Brandstetter
Thomas Lichtenegger
112
1
0
14 Oct 2025
MosaicDiff: Training-free Structural Pruning for Diffusion Model Acceleration Reflecting Pretraining Dynamics
Bowei Guo
Shengkun Tang
Cong Zeng
Zhiqiang Shen
119
1
0
13 Oct 2025
ContextGen: Contextual Layout Anchoring for Identity-Consistent Multi-Instance Generation
Ruihang Xu
Dewei Zhou
Fan Ma
Yi Yang
DiffM
100
2
0
13 Oct 2025
Massive Activations are the Key to Local Detail Synthesis in Diffusion Transformers
Chaofan Gan
Zicheng Zhao
Yuanpeng Tu
Xi Chen
Ziran Qin
Yun Xu
Mehrtash Harandi
W. Lin
128
0
0
13 Oct 2025
DiT360: High-Fidelity Panoramic Image Generation via Hybrid Training
Haoran Feng
D. Zhang
Xiangtai Li
Bo Du
Lu Qi
96
2
0
13 Oct 2025
WaveletDiff: Multilevel Wavelet Diffusion For Time Series Generation
Yu-Hsiang Wang
O. Milenkovic
DiffM
AI4TS
284
0
0
13 Oct 2025
LikePhys: Evaluating Intuitive Physics Understanding in Video Diffusion Models via Likelihood Preference
Jianhao Yuan
Fabio Pizzati
Francesco Pinto
Lars Kunze
Ivan Laptev
Paul Newman
Philip Torr
D. Martini
DiffM
VGen
155
1
0
13 Oct 2025
Diffusion Transformers with Representation Autoencoders
Boyang Zheng
Nanye Ma
Shengbang Tong
Saining Xie
DiffM
158
32
0
13 Oct 2025
DiffStyleTS: Diffusion Model for Style Transfer in Time Series
Mayank Nagda
Phil Ostheimer
Justus Arweiler
Indra Jungjohann
Jennifer Werner
...
Michael Bortz
Hans Hasse
Stephan Mandt
Marius Kloft
Sophie Fellenz
DiffM
AI4TS
76
0
0
13 Oct 2025
Flow Matching-Based Autonomous Driving Planning with Advanced Interactive Behavior Modeling
Tianyi Tan
Yinan Zheng
Ruiming Liang
Zexu Wang
Kexin Zheng
Jinliang Zheng
Jianxiong Li
Xianyuan Zhan
Jingjing Liu
80
3
0
13 Oct 2025
Vlaser: Vision-Language-Action Model with Synergistic Embodied Reasoning
Ganlin Yang
Tianyi Zhang
Haoran Hao
Weiyun Wang
Y. Liu
...
Jiangmiao Pang
Gen Luo
Wenhai Wang
Yao Mu
Zhi Hou
LM&Ro
LRM
124
2
0
13 Oct 2025
Unified Open-World Segmentation with Multi-Modal Prompts
Yang Liu
Yufei Yin
Chenchen Jing
M. Zhu
Hao Chen
Yuling Xi
Bo Feng
Hao Wang
Shiyu Li
Chunhua Shen
VLM
82
0
0
12 Oct 2025
AdaViewPlanner: Adapting Video Diffusion Models for Viewpoint Planning in 4D Scenes
Yu Li
Menghan Xia
Gongye Liu
J. Bai
Xintao Wang
Conglang Zhang
Yuxuan Lin
Ruihang Chu
Pengfei Wan
Yujiu Yang
VGen
80
1
0
12 Oct 2025
Latent Retrieval Augmented Generation of Cross-Domain Protein Binders
Zishen Zhang
Xiangzhe Kong
Wenbing Huang
Yang Liu
DiffM
146
0
0
12 Oct 2025
DEMO: Disentangled Motion Latent Flow Matching for Fine-Grained Controllable Talking Portrait Synthesis
Peiyin Chen
Zhuowei Yang
Hui Feng
Sheng Jiang
Rui Yan
DiffM
VGen
76
0
0
12 Oct 2025
UniFlow: A Unified Pixel Flow Tokenizer for Visual Understanding and Generation
Zhengrong Yue
H. Zhang
Xiangyu Zeng
Boyu Chen
Chenting Wang
...
Lu Dong
Kunpeng Du
Yi Wang
Limin Wang
Yali Wang
160
5
0
12 Oct 2025
Head-wise Adaptive Rotary Positional Encoding for Fine-Grained Image Generation
Jiaye Li
Baoyou Chen
Hui Li
Zilong Dong
Jingdong Wang
Siyu Zhu
64
0
0
12 Oct 2025
ProteinAE: Protein Diffusion Autoencoders for Structure Encoding
Shaoning Li
Le Zhuo
Yusong Wang
Mingyu Li
Xinheng He
Fandi Wu
Jiaming Song
Pheng-Ann Heng
DiffM
101
0
0
12 Oct 2025
X-VLA: Soft-Prompted Transformer as Scalable Cross-Embodiment Vision-Language-Action Model
Jinliang Zheng
Jianxiong Li
Zhihao Wang
Dongxiu Liu
Xirui Kang
...
Ya-Qin Zhang
Jiangmiao Pang
Jingjing Liu
Tai Wang
Xianyuan Zhan
LM&Ro
200
6
0
11 Oct 2025
SaFiRe: Saccade-Fixation Reiteration with Mamba for Referring Image Segmentation
Zhenjie Mao
Yuhuan Yang
Chaofan Ma
Dongsheng Jiang
Jiangchao Yao
Ya Zhang
Yanfeng Wang
100
0
0
11 Oct 2025
EditCast3D: Single-Frame-Guided 3D Editing with Video Propagation and View Selection
Huaizhi Qu
Ruichen Zhang
Shuqing Luo
Luchao Qi
Zhihao Zhang
Xiaoming Liu
Roni Sengupta
Tianlong Chen
DiffM
VGen
96
0
0
11 Oct 2025
Multi-Scale Diffusion Transformer for Jointly Simulating User Mobility and Mobile Traffic Pattern
Ziyi Liu
Qingyue Long
Zhiwen Xue
Huandong Wang
Yong Li
40
0
0
11 Oct 2025
SyncLipMAE: Contrastive Masked Pretraining for Audio-Visual Talking-Face Representation
Zeyu Ling
Xiaodong Gu
Jiangnan Tang
Changqing Zou
CLIP
96
0
0
11 Oct 2025
Speculative Jacobi-Denoising Decoding for Accelerating Autoregressive Text-to-image Generation
Yao Teng
Fuyun Wang
Xian Liu
Z. Chen
Han Shi
Yu Wang
Zhenguo Li
Weiyang Liu
Difan Zou
Xihui Liu
DiffM
105
0
0
10 Oct 2025
SPG: Sandwiched Policy Gradient for Masked Diffusion Language Models
Chenyu Wang
Paria Rashidinejad
DiJia Su
Song Jiang
S. Wang
...
Shannon Zejiang Shen
Feiyu Chen
Tommi Jaakkola
Yuandong Tian
Bo Liu
OffRL
168
5
0
10 Oct 2025
Previous
1
2
3
...
6
7
8
...
52
53
54
Next