Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
2212.09748
Cited By
v1
v2 (latest)
Scalable Diffusion Models with Transformers
IEEE International Conference on Computer Vision (ICCV), 2022
19 December 2022
William S. Peebles
Saining Xie
GNN
Re-assign community
ArXiv (abs)
PDF
HTML
HuggingFace (18 upvotes)
Papers citing
"Scalable Diffusion Models with Transformers"
50 / 2,712 papers shown
Optimizing Distributional Geometry Alignment with Optimal Transport for Generative Dataset Distillation
Xiao Cui
Yulei Qin
Wengang Zhou
Hongsheng Li
Houqiang Li
DD
OT
237
1
0
29 Nov 2025
PhysGen: Physically Grounded 3D Shape Generation for Industrial Design
Yingxuan You
Chen Zhao
Hantao Zhang
Mingda Xu
Pascal Fua
AI4CE
100
0
0
29 Nov 2025
CC-FMO: Camera-Conditioned Zero-Shot Single Image to 3D Scene Generation with Foundation Model Orchestration
Boshi Tang
Henry Zheng
Rui Huang
Gao Huang
VGen
197
0
0
29 Nov 2025
What about gravity in video generation? Post-Training Newton's Laws with Verifiable Rewards
Minh-Quan Le
Yuanzhi Zhu
Vicky Kalogeiton
Dimitris Samaras
EGVM
VGen
91
1
0
29 Nov 2025
VQRAE: Representation Quantization Autoencoders for Multimodal Understanding, Generation and Reconstruction
Sinan Du
Jiahao Guo
Bo Li
Shuhao Cui
Zhengzhuo Xu
...
Yongxian Wei
Kun Gai
X. Wang
Kai Wu
C. Yuan
224
1
0
28 Nov 2025
DualCamCtrl: Dual-Branch Diffusion Model for Geometry-Aware Camera-Controlled Video Generation
Hongfei Zhang
Kanghao Chen
Zixin Zhang
Harold Haodong Chen
Yuanhuiyi Lyu
Yuqi Zhang
Shuai Yang
Kun Zhou
Yingcong Chen
DiffM
VGen
186
2
0
28 Nov 2025
Vision Bridge Transformer at Scale
Zhenxiong Tan
Zeqing Wang
Xingyi Yang
Songhua Liu
Xinchao Wang
DiffM
107
0
0
28 Nov 2025
DisMo: Disentangled Motion Representations for Open-World Motion Transfer
Thomas Ressler-Antal
Frank Fundel
Malek Ben Alaya
S. A. Baumann
Felix Krause
Ming Gui
Bjorn Ommer
DiffM
VGen
108
0
0
28 Nov 2025
BlockVid: Block Diffusion for High-Quality and Consistent Minute-Long Video Generation
Zeyu Zhang
Shuning Chang
Yuanyu He
Yizeng Han
Jiasheng Tang
Fan Wang
Bohan Zhuang
DiffM
VGen
200
2
0
28 Nov 2025
AnyTalker: Scaling Multi-Person Talking Video Generation with Interactivity Refinement
Zhizhou Zhong
Yicheng Ji
Zhe Kong
Y. Liu
Jiarui Wang
...
Ying Qin
Huan Li
Shuiyang Mao
W. Liu
Wenhan Luo
DiffM
VGen
127
2
0
28 Nov 2025
Guiding Visual Autoregressive Models through Spectrum Weakening
Chaoyang Wang
Tianmeng Yang
Jingdong Wang
Yunhai Tong
DiffM
176
0
0
28 Nov 2025
McSc: Motion-Corrective Preference Alignment for Video Generation with Self-Critic Hierarchical Reasoning
Q. Yang
Yingjie Chen
Yuan Yao
Yifang Men
Huaizhuo Liu
Miaomiao Cui
EGVM
VGen
258
0
0
28 Nov 2025
GOATex: Geometry & Occlusion-Aware Texturing
Hyunjin Kim
Kunho Kim
Adam Lee
Wonkwang Lee
DiffM
107
0
0
28 Nov 2025
One-to-All Animation: Alignment-Free Character Animation and Image Pose Transfer
S. Shi
Jing Xu
Zhihang Li
Chunli Peng
Xiaoda Yang
Lijing Lu
Kai Hu
Jiangning Zhang
DiffM
133
0
0
28 Nov 2025
db-SP: Accelerating Sparse Attention for Visual Generative Models with Dual-Balanced Sequence Parallelism
Siqi Chen
Ke Hong
Tianchen Zhao
Ruiqi Xie
Zhenhua Zhu
X. Zhang
Yu Wang
MoE
113
0
0
28 Nov 2025
Scalable Diffusion Transformer for Conditional 4D fMRI Synthesis
Jungwoo Seo
David K. Park
Shinjae Yoo
Jiook Cha
MedIm
260
0
0
28 Nov 2025
InstanceV: Instance-Level Video Generation
Yuheng Chen
Teng Hu
Jiangning Zhang
Zhucun Xue
Ran Yi
Lizhuang Ma
DiffM
VGen
126
0
0
28 Nov 2025
Flow Straighter and Faster: Efficient One-Step Generative Modeling via MeanFlow on Rectified Trajectories
Xinxi Zhang
Shiwei Tan
Quang Nguyen
Quan Dao
Ligong Han
Xiaoxiao He
Tunyu Zhang
Alen Mrdovic
Dimitris N. Metaxas
269
1
0
28 Nov 2025
Toward Diffusible High-Dimensional Latent Spaces: A Frequency Perspective
Bolin Lai
Xudong Wang
Saketh Rambhatla
James M. Rehg
Zsolt Kira
Rohit Girdhar
Ishan Misra
DiffM
133
0
0
27 Nov 2025
TTSnap: Test-Time Scaling of Diffusion Models via Noise-Aware Pruning
Qingtao Yu
Changlin Song
Minghao Sun
Zhengyang Yu
Vinay Kumar Verma
Soumya Roy
Sumit Negi
Hongdong Li
Dylan Campbell
103
0
0
27 Nov 2025
Adversarial Flow Models
Shanchuan Lin
Ceyuan Yang
Zhijie Lin
Hao Chen
Haoqi Fan
GAN
154
0
0
27 Nov 2025
ITS3D: Inference-Time Scaling for Text-Guided 3D Diffusion Models
Zhenglin Zhou
Fan Ma
Xiaobo Xia
Hehe Fan
Yi Yang
Tat-Seng Chua
DiffM
3DGS
127
0
0
27 Nov 2025
Generative Anchored Fields: Controlled Data Generation via Emergent Velocity Fields and Transport Algebra
Deressa Wodajo Deressa
Hannes Mareen
Peter Lambert
Glenn Van Wallendael
69
0
0
27 Nov 2025
StreamFlow: Theory, Algorithm, and Implementation for High-Efficiency Rectified Flow Generation
Sen Fang
Hongbin Zhong
Yalin Feng
Dimitris N. Metaxas
Dimitris N. Metaxas
156
1
0
27 Nov 2025
IMTalker: Efficient Audio-driven Talking Face Generation with Implicit Motion Transfer
Bo Chen
Tao Liu
Qi Chen
Xie Chen
Zilong Zheng
VGen
100
0
0
27 Nov 2025
ReasonEdit: Towards Reasoning-Enhanced Image Editing Models
Fukun Yin
Shiyu Liu
Yucheng Han
Zhibo Wang
Peng Xing
...
Pengtao Chen
Xiangyu Zhang
Daxin Jiang
Xianfang Zeng
Gang Yu
DiffM
KELM
LRM
252
0
0
27 Nov 2025
MobileI2V: Fast and High-Resolution Image-to-Video on Mobile Devices
Shuai Zhang
Bao Tang
Siyuan Yu
Yueting Zhu
Jingfeng Yao
Ya Zou
Shanglin Yuan
Li Yu
Wenyu Liu
Xinggang Wang
DiffM
VGen
214
0
0
26 Nov 2025
Which Layer Causes Distribution Deviation? Entropy-Guided Adaptive Pruning for Diffusion and Flow Models
Changlin Li
Jiawei Zhang
Z. Shi
Zongxin Yang
Zhihui Li
Xiaojun Chang
DiffM
VLM
267
0
0
26 Nov 2025
Canvas-to-Image: Compositional Image Generation with Multimodal Controls
Yusuf Dalva
Guocheng Qian
Maya Goldenberg
Tsai-Shien Chen
Kfir Aberman
Sergey Tulyakov
Pinar Yanardag
Kuan-Chieh Wang
DiffM
222
0
0
26 Nov 2025
MoGAN: Improving Motion Quality in Video Diffusion via Few-Step Motion Adversarial Post-Training
Haotian Xue
Qi-An Chen
Zhonghao Wang
Xun Huang
Eli Shechtman
Jinrong Xie
Yongxin Chen
DiffM
VGen
540
0
0
26 Nov 2025
Deep Parameter Interpolation for Scalar Conditioning
Chicago Y. Park
Michael T. McCann
Cristina Garcia-Cardona
B. Wohlberg
Ulugbek S. Kamilov
AI4CE
280
0
0
26 Nov 2025
DiverseVAR: Balancing Diversity and Quality of Next-Scale Visual Autoregressive Models
Mingue Park
Prin Phunyaphibarn
Phillip Y. Lee
Minhyuk Sung
124
0
0
26 Nov 2025
FaithFusion: Harmonizing Reconstruction and Generation via Pixel-wise Information Gain
Y. Wang
Xiaofan Li
Chi Huang
Wenhao Zhang
Hao Li
Bosheng Wang
Xun Sun
Jun Wang
DiffM
200
0
0
26 Nov 2025
Efficient Training for Human Video Generation with Entropy-Guided Prioritized Progressive Learning
Changlin Li
Jiawei Zhang
Shuhao Liu
Sihao Lin
Z. Shi
Zhihui Li
Xiaojun Chang
DiffM
VGen
279
0
0
26 Nov 2025
3MDiT: Unified Tri-Modal Diffusion Transformer for Text-Driven Synchronized Audio-Video Generation
Y. Li
Heyu Si
Federico Landi
Pilar Oplustil Gallegos
Ioannis Koutsoumpas
...
Ruiju Fu
Qi Guo
Xin Jin
Shunyu Liu
Mingli Song
DiffM
VGen
193
0
0
26 Nov 2025
SocialNav: Training Human-Inspired Foundation Model for Socially-Aware Embodied Navigation
Ziyi Chen
Yingnan Guo
Zedong Chu
Minghua Luo
Yanfen Shen
...
Lu Liu
Honglin Han
X. Wu
Mu Xu
Yu Zhang
551
0
0
26 Nov 2025
Saddle-Free Guidance: Improved On-Manifold Sampling without Labels or Additional Training
Eric Yeats
Darryl Hannan
Wilson Fearn
T. Doster
Henry Kvinge
Scott Mahan
DiffM
133
0
0
26 Nov 2025
Going with the Speed of Sound: Pushing Neural Surrogates into Highly-turbulent Transonic Regimes
Fabian Paischer
Leo Cotteleer
Yann Dreze
Richard Kurle
Dylan Rubini
Maurits Bleeker
Tobias Kronlachner
Johannes Brandstetter
AI4CE
221
1
0
26 Nov 2025
DUO-TOK: Dual-Track Semantic Music Tokenizer for Vocal-Accompaniment Generation
Rui Lin
Zhiyue Wu
Jiahe Le
Kangdi Wang
Weixiong Chen
Junyu Dai
Tao Jiang
171
1
0
25 Nov 2025
DINO-Tok: Adapting DINO for Visual Tokenizers
Mingkai Jia
Mingxiao Li
Liaoyuan Fan
Tianxing Shi
Jiaxin Guo
...
Xiaoyang Guo
Xiao-Xiao Long
Qian Zhang
P. Tan
Wei Yin
ViT
201
0
0
25 Nov 2025
Inferix: A Block-Diffusion based Next-Generation Inference Engine for World Simulation
Inferix Team
Tianyu Feng
Yizeng Han
Jiahao He
Yuanyu He
...
Jichao Wu
M. Yang
Yinghao Yu
Zeyu Zhang
Bohan Zhuang
VGen
SyDa
327
1
0
25 Nov 2025
A Training-Free Approach for Multi-ID Customization via Attention Adjustment and Spatial Control
Jiawei Lin
Guanlong Jiao
Jianjin Xu
288
0
0
25 Nov 2025
Layer-Aware Video Composition via Split-then-Merge
Ozgur Kara
Yujia Chen
Ming-Hsuan Yang
James M. Rehg
Wen-Sheng Chu
Du Tran
VGen
186
0
0
25 Nov 2025
HiCoGen: Hierarchical Compositional Text-to-Image Generation in Diffusion Models via Reinforcement Learning
Hongji Yang
Yucheng Zhou
Wencheng Han
Runzhou Tao
Zhongying Qiu
Jianfei Yang
Jianbing Shen
DiffM
EGVM
369
0
0
25 Nov 2025
Rectified SpaAttn: Revisiting Attention Sparsity for Efficient Video Generation
Xuewen Liu
Zhikai Li
Jing Zhang
Mengjuan Chen
Qingyi Gu
VGen
151
0
0
25 Nov 2025
PromptMoG: Enhancing Diversity in Long-Prompt Image Generation via Prompt Embedding Mixture-of-Gaussian Sampling
Bo-Kai Ruan
Teng-Fang Hsiao
Ling Lo
Yi-Lun Wu
Hong-Han Shuai
DiffM
VLM
189
0
0
25 Nov 2025
Infinity-RoPE: Action-Controllable Infinite Video Generation Emerges From Autoregressive Self-Rollout
Hidir Yesiltepe
Tuna Han Salih Meral
Adil Kaan Akan
Kaan Oktay
Pinar Yanardag
VGen
233
5
0
25 Nov 2025
A Reason-then-Describe Instruction Interpreter for Controllable Video Generation
Shengqiong Wu
Weicai Ye
Y. Zhang
Jiahao Wang
Quande Liu
Xintao Wang
Pengfei Wan
Kun Gai
Hao Fei
Tat-Seng Chua
VGen
LRM
188
0
0
25 Nov 2025
OmniRefiner: Reinforcement-Guided Local Diffusion Refinement
Yaoli Liu
Ziheng Ouyang
Shengtao Lou
Yiren Song
206
0
0
25 Nov 2025
Flash-DMD: Towards High-Fidelity Few-Step Image Generation with Efficient Distillation and Joint Reinforcement Learning
Guanjie Chen
Shirui Huang
Kai Liu
J. Zhu
Xiaoye Qu
Peng Chen
Yu Cheng
Yifu Sun
208
1
0
25 Nov 2025
Previous
1
2
3
4
5
...
53
54
55
Next
Page 2 of 55
Page
of 55
Go