Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
2212.09748
Cited By
v1
v2 (latest)
Scalable Diffusion Models with Transformers
IEEE International Conference on Computer Vision (ICCV), 2022
19 December 2022
William S. Peebles
Saining Xie
GNN
Re-assign community
ArXiv (abs)
PDF
HTML
HuggingFace (18 upvotes)
Papers citing
"Scalable Diffusion Models with Transformers"
50 / 2,711 papers shown
Diffusion Transformer meets Multi-level Wavelet Spectrum for Single Image Super-Resolution
Peng Du
Hui Li
Han Xu
Paul Barom Jeon
Dongwook Lee
Daehyun Ji
Ran Yang
Feng Zhu
405
0
0
03 Nov 2025
EVLP:Learning Unified Embodied Vision-Language Planner with Reinforced Supervised Fine-Tuning
Xinyan Cai
Shiguang Wu
Dafeng Chi
Yuzheng Zhuang
Xingyue Quan
Jianye Hao
Qiang Guan
100
0
0
03 Nov 2025
Occlusion-Aware Diffusion Model for Pedestrian Intention Prediction
Yu Liu
Zhijie Liu
Zedong Yang
You-Fu Li
He Kong
DiffM
258
0
0
02 Nov 2025
RefVTON: person-to-person Try on with Additional Unpaired Visual Reference
Liuzhuozheng Li
Yue Gong
Shanyuan Liu
Bo Cheng
Yuhang Ma
Liebucha Wu
Dengyang Jiang
Zanyi Wang
Dawei Leng
Yuhui Yin
350
0
0
02 Nov 2025
Linear Differential Vision Transformer: Learning Visual Contrasts via Pairwise Differentials
Yifan Pu
Jixuan Ying
Qixiu Li
Tianzhu Ye
Dongchen Han
Xiaochen Wang
Ziyi Wang
Xinyu Shao
Gao Huang
Xiu Li
ViT
126
0
0
02 Nov 2025
ID-Crafter: VLM-Grounded Online RL for Compositional Multi-Subject Video Generation
Panwang Pan
Jingjing Zhao
Yuchen Lin
Chenguo Lin
Chenxin Li
Haopeng Li
Honglei Yan
Tingting Shen
DiffM
VGen
354
0
0
01 Nov 2025
iFlyBot-VLA Technical Report
Yuan Zhang
Chenyu Xue
Wenjie Xu
Chao Ji
Jiajia wu
Jia Pan
LM&Ro
304
0
0
01 Nov 2025
MIFO: Learning and Synthesizing Multi-Instance from One Image
Kailun Su
Ziqi He
Xi Wang
Yang Zhou
104
0
0
01 Nov 2025
MolChord: Structure-Sequence Alignment for Protein-Guided Drug Design
Wei Zhang
Zekun Guo
Yingce Xia
Peiran Jin
Shufang Xie
Tao Qin
Xiang Li
106
1
0
31 Oct 2025
E-MMDiT: Revisiting Multimodal Diffusion Transformer Design for Fast Image Synthesis under Limited Resources
Tong Shen
Jingai Yu
Dong Zhou
Dong Li
E. Barsoum
DiffM
131
0
0
31 Oct 2025
Foundation Models for Trajectory Planning in Autonomous Driving: A Review of Progress and Open Challenges
Kemal Oksuz
Alexandru Buburuzan
Anthony Knittel
Yuhan Yao
P. Dokania
83
0
0
31 Oct 2025
Generative Semantic Coding for Ultra-Low Bitrate Visual Communication and Analysis
Weiming Chen
Yijia Wang
Zhihan Zhu
Z. He
DiffM
134
0
0
31 Oct 2025
Dual-Stream Diffusion for World-Model Augmented Vision-Language-Action Model
John Won
Kyungmin Lee
Huiwon Jang
Dongyoung Kim
Jinwoo Shin
204
3
0
31 Oct 2025
Learning Generalizable Visuomotor Policy through Dynamics-Alignment
Dohyeok Lee
Jung Min Lee
Munkyung Kim
Seokhun Ju
Jin Woo Koo
Kyungjae Lee
Dohyeong Kim
Taehyun Cho
Jungwoo Lee
99
0
0
31 Oct 2025
InertialAR: Autoregressive 3D Molecule Generation with Inertial Frames
Haorui Li
Weitao Du
Yuqiang Li
Ziqiao Wang
Shengchao Liu
151
1
0
31 Oct 2025
UniTok-Audio: A Unified Audio Generation Framework via Generative Modeling on Discrete Codec Tokens
Chengwei Liu
Haoyin Yan
Shaofei Xue
Xiaotao Liang
Yinghao Liu
Zheng Xue
Gang Song
Boyang Zhou
239
2
0
30 Oct 2025
Jasmine: A Simple, Performant and Scalable JAX-based World Modeling Codebase
Mihir Mahajan
Alfred Nguyen
Franz Srambical
Stefan Bauer
192
0
0
30 Oct 2025
LeMiCa: Lexicographic Minimax Path Caching for Efficient Diffusion-Based Video Generation
Huanlin Gao
Ping Chen
Fuyuan Shi
C. Tan
Zhaoxiang Liu
Fang Zhao
Kai Wang
Shiguo Lian
DiffM
VGen
345
0
0
30 Oct 2025
Denoising Refinement Diffusion Models for Simultaneous Generation of Multi-scale Mobile Network Traffic
Xiaoqian Qi
Haoye Chai
Sichang Liu
Lei Yue
Raoyuan Pan
Yue Wang
Yong Li
DiffM
89
0
0
30 Oct 2025
Co-Evolving Latent Action World Models
Yucen Wang
Fengming Zhang
De-Chuan Zhan
Li Zhao
Kaixin Wang
Jiang Bian
VGen
225
0
0
30 Oct 2025
Emu3.5: Native Multimodal Models are World Learners
Yufeng Cui
Honghao Chen
Haoge Deng
X. Y. Huang
Xinghang Li
...
Zhuo Chen
Yulong Ao
Tiejun Huang
Zhongyuan Wang
Xinlong Wang
MLLM
VGen
456
16
0
30 Oct 2025
OmniX: From Unified Panoramic Generation and Perception to Graphics-Ready 3D Scenes
Yukun Huang
Jiwen Yu
Yanning Zhou
Jianan Wang
Xintao Wang
Pengfei Wan
Xihui Liu
VGen
161
1
0
30 Oct 2025
ScaleDiff: Higher-Resolution Image Synthesis via Efficient and Model-Agnostic Diffusion
Sungho Koh
SeungJu Cha
Hyunwoo Oh
Kwanyoung Lee
Dong-Jin Kim
207
0
0
29 Oct 2025
VFXMaster: Unlocking Dynamic Visual Effect Generation via In-Context Learning
Baolu Li
Y. Zhang
Qinghe Wang
Liqian Ma
Xiaoyu Shi
...
Pengfei Wan
Zhenfei Yin
Yunzhi Zhuge
Huchuan Lu
Xu Jia
VGen
232
4
0
29 Oct 2025
RegionE: Adaptive Region-Aware Generation for Efficient Image Editing
Pengtao Chen
Xianfang Zeng
Maosen Zhao
Mingzhu Shen
Peng Ye
Bangyin Xiang
Zhibo Wang
Wei Cheng
Gang Yu
Tao Chen
DiffM
305
1
0
29 Oct 2025
MIRO: MultI-Reward cOnditioned pretraining improves T2I quality and efficiency
Nicolas Dufour
Lucas Degeorge
Arijit Ghosh
Vicky Kalogeiton
David Picard
EGVM
376
1
0
29 Oct 2025
Bayesian Speech synthesizers Can Learn from Multiple Teachers
Ziyang Zhang
Yifan Gao
Xuenan Xu
Baoxiangli
Wen Wu
Chao Zhang
93
0
0
28 Oct 2025
Neural USD: An object-centric framework for iterative editing and control
Alejandro Escontrela
Shrinu Kushagra
Sjoerd van Steenkiste
Yulia Rubanova
Aleksander Holynski
Kelsey R. Allen
Kevin Murphy
Thomas Kipf
DiffM
148
0
0
28 Oct 2025
Generative View Stitching
Chonghyuk Song
Michal Stary
Boyuan Chen
George Kopanas
Vincent Sitzmann
VGen
287
1
0
28 Oct 2025
ETC: training-free diffusion models acceleration with Error-aware Trend Consistency
Jiajian Xie
Hubery Yin
Chen Li
Zhou Zhao
Shengyu Zhang
DiffM
194
0
0
28 Oct 2025
Model-Guided Dual-Role Alignment for High-Fidelity Open-Domain Video-to-Audio Generation
Kang Zhang
T. Pham
Suyeon Lee
Axi Niu
Arda Senocak
Joon Son Chung
AuLLM
VGen
272
0
0
28 Oct 2025
Routing Matters in MoE: Scaling Diffusion Transformers with Explicit Routing Guidance
Y. X. Wei
Shiwei Zhang
Hangjie Yuan
Yujin Han
Zhekai Chen
...
Difan Zou
Xihui Liu
Yingya Zhang
Yu Liu
Hongming Shan
DiffM
MoE
208
3
0
28 Oct 2025
VividCam: Learning Unconventional Camera Motions from Virtual Synthetic Videos
Qiucheng Wu
Handong Zhao
Zhixin Shu
Jing Shi
Yang Zhang
Shiyu Chang
DiffM
VGen
340
0
0
28 Oct 2025
Group Relative Attention Guidance for Image Editing
Xuanpu Zhang
Xuesong Niu
Ruidong Chen
Dan Song
Jianhao Zeng
Penghui Du
Haoxiang Cao
Kai Wu
An-an Liu
DiffM
210
0
0
28 Oct 2025
Decoupled MeanFlow: Turning Flow Models into Flow Maps for Accelerated Sampling
Kyungmin Lee
Sihyun Yu
Jinwoo Shin
AI4CE
242
3
0
28 Oct 2025
FreeFuse: Multi-Subject LoRA Fusion via Auto Masking at Test Time
Yaoli Liu
Yao-Xiang Ding
Kun Zhou
186
0
0
27 Oct 2025
More Than Generation: Unifying Generation and Depth Estimation via Text-to-Image Diffusion Models
Hongkai Lin
Dingkang Liang
Mingyang Du
Xin Zhou
X. Bai
MoMe
MDE
VLM
517
0
0
27 Oct 2025
Mixed-Density Diffuser: Efficient Planning with Non-Uniform Temporal Resolution
Crimson Stambaugh
Rajesh P. N. Rao
DiffM
218
0
0
27 Oct 2025
M
3
^{3}
3
T2IBench: A Large-Scale Multi-Category, Multi-Instance, Multi-Relation Text-to-Image Benchmark
Huixuan Zhang
Xiaojun Wan
VLM
255
0
0
27 Oct 2025
TRELLISWorld: Training-Free World Generation from Object Generators
Hanke Chen
Yuan Liu
Minchen Li
146
2
0
27 Oct 2025
Scaling Up Occupancy-centric Driving Scene Generation: Dataset and Method
Bohan Li
Xin Jin
Hu Zhu
Hongsi Liu
Ruikai Li
...
Chao Ma
Yueming Jin
Hao Zhao
Xiaokang Yang
Wenjun Zeng
153
1
0
27 Oct 2025
LightFusion: A Light-weighted, Double Fusion Framework for Unified Multimodal Understanding and Generation
Zeyu Wang
Z. Chen
Chenhui Gou
Feng Li
Chaorui Deng
...
Kunchang Li
Weihao Yu
Haoqin Tu
Haoqi Fan
Cihang Xie
361
0
0
27 Oct 2025
On the Anisotropy of Score-Based Generative Models
Andreas Floros
Seyed-Mohsen Moosavi-Dezfooli
Pier Luigi Dragotti
209
1
0
27 Oct 2025
Sampling from Energy distributions with Target Concrete Score Identity
Sergei Kholkin
Francisco Vargas
Alexander Korotin
141
0
0
27 Oct 2025
FARMER: Flow AutoRegressive Transformer over Pixels
Guangting Zheng
Qinyu Zhao
Tao Yang
Fei Xiao
Zhijie Lin
Jie Wu
Jiajun Deng
Y. Zhang
Rui Zhu
VGen
255
4
0
27 Oct 2025
RareFlow: Physics-Aware Flow-Matching for Cross-Sensor Super-Resolution of Rare-Earth Features
Forouzan Fallah
Wenwen Li
Chia-Yu Hsu
Hyunho Lee
Yezhou Yang
297
0
0
27 Oct 2025
Simple Denoising Diffusion Language Models
Huaisheng Zhu
Zhengyu Chen
Shijie Zhou
Zhihui Xie
Yige Yuan
Zhimeng Guo
Siyuan Xu
Hangfan Zhang
V. Honavar
Teng Xiao
DiffM
157
1
0
27 Oct 2025
Lookahead Anchoring: Preserving Character Identity in Audio-Driven Human Animation
Junyoung Seo
Rodrigo Mira
A. Haliassos
Stella Bounareli
Honglie Chen
Linh Tran
Seungryong Kim
Zoe Landgraf
Jie Shen
VGen
152
1
0
27 Oct 2025
A Survey on Efficient Vision-Language-Action Models
Zhaoshu Yu
Bo Wang
Pengpeng Zeng
Haonan Zhang
Ji Zhang
Lianli Gao
Jingkuan Song
Nicu Sebe
Heng Tao Shen
LM&Ro
202
5
0
27 Oct 2025
SAO-Instruct: Free-form Audio Editing using Natural Language Instructions
Michael Ungersböck
Florian Grötschla
Luca A. Lanzendörfer
June Young Yi
Changho Choi
Roger Wattenhofer
AuLLM
161
1
0
26 Oct 2025
Previous
1
2
3
4
5
6
...
53
54
55
Next