Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
2212.09748
Cited By
v1
v2 (latest)
Scalable Diffusion Models with Transformers
IEEE International Conference on Computer Vision (ICCV), 2022
19 December 2022
William S. Peebles
Saining Xie
GNN
Re-assign community
ArXiv (abs)
PDF
HTML
HuggingFace (18 upvotes)
Papers citing
"Scalable Diffusion Models with Transformers"
50 / 2,711 papers shown
LAMIC: Layout-Aware Multi-Image Composition via Scalability of Multimodal Diffusion Transformer
Yuzhuo Chen
Zehua Ma
Jianhua Wang
Kai Kang
Shunyu Yao
Weiming Zhang
VLM
167
2
0
24 Dec 2025
Denoise to Track: Harnessing Video Diffusion Priors for Robust Correspondence
Tianyu Yuan
Yuanbo Yang
Lin Chen
Yao Yao
Zhuzhong Qian
DiffM
VGen
238
0
0
04 Dec 2025
YingMusic-SVC: Real-World Robust Zero-Shot Singing Voice Conversion with Flow-GRPO and Singing-Specific Inductive Biases
Gongyu Chen
Xiaoyu Zhang
Zhenqiang Weng
Junjie Zheng
Da Shen
Chaofan Ding
Wei-Qiang Zhang
Zihao Chen
49
0
0
04 Dec 2025
Efficient Generative Transformer Operators For Million-Point PDEs
Armand K. Koupai
Lise Le Boudec
Patrick Gallinari
68
0
0
04 Dec 2025
Refaçade: Editing Object with Given Reference Texture
Youze Huang
Penghui Ruan
Bojia Zi
Xianbiao Qi
Jianan Wang
Rong Xiao
DiffM
180
0
0
04 Dec 2025
ReflexFlow: Rethinking Learning Objective for Exposure Bias Alleviation in Flow Matching
Guanbo Huang
Jingjia Mao
Fanding Huang
Fengkai Liu
Xiangyang Luo
...
Jiasheng Lu
X. Wang
Pei Liu
Ruiliu Fu
Shao-Lun Huang
145
0
0
04 Dec 2025
UniTS: Unified Time Series Generative Model for Remote Sensing
Yuxiang Zhang
Shunlin Liang
Wenyuan Li
Han Ma
Jianglei Xu
...
Jiangwei Xie
Wei Li
Mengmeng Zhang
R. Tao
X. Xia
DiffM
AI4TS
266
0
0
04 Dec 2025
Joint 3D Geometry Reconstruction and Motion Generation for 4D Synthesis from a Single Image
Yanran Zhang
Ziyi Wang
Wenzhao Zheng
Zheng Zhu
Jie Zhou
Jiwen Lu
VGen
3DV
235
0
0
04 Dec 2025
VideoSSM: Autoregressive Long Video Generation with Hybrid State-Space Memory
Yifei Yu
Xiaoshan Wu
Xinting Hu
Tao Hu
Yangtian Sun
...
Bo Wang
Lin Ma
Yuewen Ma
Zhongrui Wang
Xiaojuan Qi
DiffM
VGen
182
1
0
04 Dec 2025
A Sanity Check for Multi-In-Domain Face Forgery Detection in the Real World
Jikang Cheng
Renye Yan
Zhiyuan Yan
Yaozhong Gan
Xueyi Zhang
Zhongyuan Wang
Wei Peng
Ling Liang
123
0
0
04 Dec 2025
AdaPower: Specializing World Foundation Models for Predictive Manipulation
Yuhang Huang
SHilong Zou
J. Zhang
Xinwang Liu
Ruizhen Hu
Kai Xu
80
0
0
03 Dec 2025
ConvRot: Rotation-Based Plug-and-Play 4-bit Quantization for Diffusion Transformers
Feice Huang
Zuliang Han
Xing Zhou
Yihuang Chen
Lifei Zhu
Haoqian Wang
MQ
160
0
0
03 Dec 2025
Beyond Boundary Frames: Audio-Visual Semantic Guidance for Context-Aware Video Interpolation
Yuchen Deng
Xiuyang Wu
Hai-Tao Zheng
Jie Wang
Feidiao Yang
Yuxing Han
VGen
223
0
0
03 Dec 2025
CoDA: From Text-to-Image Diffusion Models to Training-Free Dataset Distillation
Letian Zhou
Songhua Liu
Xinchao Wang
153
0
0
03 Dec 2025
GeoVideo: Introducing Geometric Regularization into Video Generation Model
Yunpeng Bai
Shaoheng Fang
Chaohui Yu
Fan Wang
Qixing Huang
DiffM
VGen
MDE
459
2
0
03 Dec 2025
CSMapping: Scalable Crowdsourced Semantic Mapping and Topology Inference for Autonomous Driving
Zhijian Qiao
Zehuan Yu
Tong Li
Chih-Chung Chou
Wenchao Ding
Shaojie Shen
102
0
0
03 Dec 2025
C3G: Learning Compact 3D Representations with 2K Gaussians
Honggyu An
Jaewoo Jung
Mungyeom Kim
Sunghwan Hong
Chaehyun Kim
...
Takuya Narihira
Hyuna Ko
J. Kim
Yuki Mitsufuji
Seungryong Kim
3DGS
3DV
215
0
0
03 Dec 2025
FloodDiffusion: Tailored Diffusion Forcing for Streaming Motion Generation
Yiyi Cai
Y. Wu
Kunhang Li
You Zhou
Bo Zheng
Haiyang Liu
VGen
114
0
0
03 Dec 2025
SimFlow: Simplified and End-to-End Training of Latent Normalizing Flows
Qinyu Zhao
Guangting Zheng
Tao Yang
Rui Zhu
Xingjian Leng
Stephen Gould
Liang Zheng
DRL
187
0
0
03 Dec 2025
YingVideo-MV: Music-Driven Multi-Stage Video Generation
Jiahui Chen
Weida Wang
Runhua Shi
Huan Yang
Chaofan Ding
Zihao Chen
DiffM
VGen
244
0
0
02 Dec 2025
MAViD: A Multimodal Framework for Audio-Visual Dialogue Understanding and Generation
Youxin Pang
Jiajun Liu
L. Tan
Yong Zhang
Feng Gao
Xiang Deng
Zhuoliang Kang
Xiaoming Wei
Y. Liu
VGen
127
0
0
02 Dec 2025
Hear What Matters! Text-conditioned Selective Video-to-Audio Generation
Junwon Lee
Juhan Nam
Jiyoung Lee
DiffM
VGen
112
0
0
02 Dec 2025
MultiShotMaster: A Controllable Multi-Shot Video Generation Framework
Qinghe Wang
Xiaoyu Shi
Baolu Li
Weikang Bian
Quande Liu
Huchuan Lu
Xintao Wang
Pengfei Wan
Kun Gai
Xu Jia
VGen
215
2
0
02 Dec 2025
Video2Act: A Dual-System Video Diffusion Policy with Robotic Spatio-Motional Modeling
Yueru Jia
Jiaming Liu
Shengbang Liu
Rui Zhou
W. Yu
Yuyang Yan
Xiaowei Chi
Yandong Guo
Boxin Shi
Shanghang Zhang
VGen
306
1
0
02 Dec 2025
Taming Camera-Controlled Video Generation with Verifiable Geometry Reward
Zhaoqing Wang
Xiaobo Xia
Zhuolin Bie
Jinlin Liu
Dongdong Yu
Jia-Wang Bian
Changhu Wang
EGVM
VGen
156
0
0
02 Dec 2025
Generative Editing in the Joint Vision-Language Space for Zero-Shot Composed Image Retrieval
Xin Wang
H. Zhang
Mang Li
Zhaohui Xia
Y. Chen
Yu Zhang
Chunyu Wei
DiffM
154
0
0
01 Dec 2025
Open-world Hand-Object Interaction Video Generation Based on Structure and Contact-aware Representation
Haodong Yan
Hang Yu
Zhide Zhong
Weilin Yuan
Xin Gong
...
Chengxi Heyu
Junfeng Li
Wenxuan Song
Shunbo Zhou
Haoang Li
72
0
0
01 Dec 2025
DreamingComics: A Story Visualization Pipeline via Subject and Layout Customized Generation using Video Models
Patrick Kwon
Chen Chen
DiffM
AI4TS
VGen
153
0
0
01 Dec 2025
Reversible Inversion for Training-Free Exemplar-guided Image Editing
Yuke Li
Lianli Gao
Ji Zhang
Pengpeng Zeng
Lichuan Xiang
Hongkai Wen
Heng Tao Shen
Jingkuan Song
DiffM
131
0
0
01 Dec 2025
ViT
3
^3
3
: Unlocking Test-Time Training in Vision
Dongchen Han
Y. Li
Tianyu Li
Z. Cao
Ziming Wang
Jun Song
Yu Cheng
Bo Zheng
Gao Huang
ViT
76
0
0
01 Dec 2025
Modality-Augmented Fine-Tuning of Foundation Robot Policies for Cross-Embodiment Manipulation on GR1 and G1
Junsung Park
Hogun Kee
Songhwai Oh
113
0
0
01 Dec 2025
Efficient Training of Diffusion Mixture-of-Experts Models: A Practical Recipe
Y. Liu
Yang Yue
Jingyuan Zhang
Chenxi Sun
Yang Zhou
Wencong Zeng
Ruiming Tang
Guorui Zhou
DiffM
MoE
114
0
0
01 Dec 2025
TokenPure: Watermark Removal through Tokenized Appearance and Structural Guidance
Pei Yang
Y. Liu
Kelly Peng
Yuan Gao
Yiren Song
WIGM
201
0
0
01 Dec 2025
TUNA: Taming Unified Visual Representations for Native Unified Multimodal Models
Zhiheng Liu
Weiming Ren
Haozhe Liu
Zijian Zhou
S. Chen
...
Ping Luo
Wei Liu
Tao Xiang
Jonas Schult
Yuren Cong
162
1
0
01 Dec 2025
FRAMER: Frequency-Aligned Self-Distillation with Adaptive Modulation Leveraging Diffusion Priors for Real-World Image Super-Resolution
Seungho Choi
Jeahun Sung
Jihyong Oh
DiffM
162
0
0
01 Dec 2025
SpriteHand: Real-Time Versatile Hand-Object Interaction with Autoregressive Video Generation
Zisu Li
Hengye Lyu
Jiaxin Shi
Yufeng Zeng
Mingming Fan
Hanwang Zhang
Chen Liang
VGen
189
0
0
01 Dec 2025
Improved Mean Flows: On the Challenges of Fastforward Generative Models
Zhengyang Geng
Yiyang Lu
Zongze Wu
Eli Shechtman
J. Zico Kolter
Kaiming He
AI4CE
138
3
0
01 Dec 2025
ResDiT: Evoking the Intrinsic Resolution Scalability in Diffusion Transformers
Yiyang Ma
Feng Zhou
Xuedan Yin
Pu Cao
Yonghao Dang
Jianqin Yin
97
0
0
01 Dec 2025
Dynamic-eDiTor: Training-Free Text-Driven 4D Scene Editing with Multimodal Diffusion Transformer
Dong In Lee
Hyungjun Doh
Seunggeun Chi
Runlin Duan
Sangpil Kim
K. Ramani
DiffM
3DGS
VGen
145
0
0
30 Nov 2025
CycleManip: Enabling Cyclic Task Manipulation via Effective Historical Perception and Understanding
Yi-Lin Wei
Haoran Liao
Yuhao Lin
Pengyue Wang
Zhizhao Liang
Guiliang Liu
Wei-Shi Zheng
57
0
0
30 Nov 2025
Silhouette-based Gait Foundation Model
Dingqiang Ye
Chao Fan
Kartik Narayan
Bingzhe Wu
Chengwen Luo
Jianqiang Li
Vishal M. Patel
65
0
0
30 Nov 2025
TrajDiff: End-to-end Autonomous Driving without Perception Annotation
Xingtai Gui
Jianbo Zhao
Wencheng Han
Jikai Wang
Jiahao Gong
Feiyang Tan
Cheng-Zhong Xu
Jianbing Shen
80
1
0
30 Nov 2025
Audio-Visual World Models: Towards Multisensory Imagination in Sight and Sound
Jiahua Wang
Shannan Yan
Leqi Zheng
Jialong Wu
Yaoxin Mao
VGen
162
0
0
30 Nov 2025
UniDiff: Parameter-Efficient Adaptation of Diffusion Models for Land Cover Classification with Multi-Modal Remotely Sensed Imagery and Sparse Annotations
Yuzhen Hu
Saurabh Prasad
67
0
0
29 Nov 2025
Optimizing Distributional Geometry Alignment with Optimal Transport for Generative Dataset Distillation
Xiao Cui
Yulei Qin
Wengang Zhou
Hongsheng Li
Houqiang Li
DD
OT
232
1
0
29 Nov 2025
Image Generation as a Visual Planner for Robotic Manipulation
Ye Pang
VGen
90
0
0
29 Nov 2025
PhysGen: Physically Grounded 3D Shape Generation for Industrial Design
Yingxuan You
Chen Zhao
Hantao Zhang
Mingda Xu
Pascal Fua
AI4CE
90
0
0
29 Nov 2025
What about gravity in video generation? Post-Training Newton's Laws with Verifiable Rewards
Minh-Quan Le
Yuanzhi Zhu
Vicky Kalogeiton
Dimitris Samaras
EGVM
VGen
91
1
0
29 Nov 2025
CC-FMO: Camera-Conditioned Zero-Shot Single Image to 3D Scene Generation with Foundation Model Orchestration
Boshi Tang
Henry Zheng
Rui Huang
Gao Huang
VGen
196
0
0
29 Nov 2025
LAP: Fast LAtent Diffusion Planner with Fine-Grained Feature Distillation for Autonomous Driving
Jinhao Zhang
Wenlong Xia
Zhexuan Zhou
Youmin Gong
Jie Mei
188
0
0
29 Nov 2025
1
2
3
4
...
53
54
55
Next
Page 1 of 55
Page
of 55
Go