Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
2212.09748
Cited By
v1
v2 (latest)
Scalable Diffusion Models with Transformers
IEEE International Conference on Computer Vision (ICCV), 2022
19 December 2022
William S. Peebles
Saining Xie
GNN
Re-assign community
ArXiv (abs)
PDF
HTML
HuggingFace (18 upvotes)
Papers citing
"Scalable Diffusion Models with Transformers"
50 / 2,712 papers shown
Masked Auto-Regressive Variational Acceleration: Fast Inference Makes Practical Reinforcement Learning
Yuxuan Gu
Weimin Bai
Yifei Wang
Weijian Luo
H. Sun
DiffM
OffRL
257
0
0
19 Nov 2025
First Frame Is the Place to Go for Video Content Customization
Jingxi Chen
Z. Li
Zhichao Liu
Guangyao Shi
Xiyang Wu
Fuxiao Liu
Cornelia Fermüller
Brandon Yushan Feng
Yiannis Aloimonos
DiffM
VGen
207
0
0
19 Nov 2025
BD-Net: Has Depth-Wise Convolution Ever Been Applied in Binary Neural Networks?
DoYoung Kim
Jin-Seop Lee
Noo-Ri Kim
SungJoon Lee
Jee-Hyong Lee
MQ
162
3
0
19 Nov 2025
Insert In Style: A Zero-Shot Generative Framework for Harmonious Cross-Domain Object Composition
Raghu Chittersu
Yuvraj Singh Rathore
Pranav Adlinge
Kunal Swami
DiffM
268
0
0
19 Nov 2025
SplitFlux: Learning to Decouple Content and Style from a Single Image
Yitong Yang
Y Samuel Wang
Changshuo Wang
Yongjun Zhang
Ziyang Chen
Shuting He
231
1
0
19 Nov 2025
Towards Deploying VLA without Fine-Tuning: Plug-and-Play Inference-Time VLA Policy Steering via Embodied Evolutionary Diffusion
Zhuo Li
Junjia Liu
Zhipeng Dong
Tao Teng
Quentin Rouxel
D. Caldwell
Fei Chen
109
0
0
18 Nov 2025
Text2Traffic: A Text-to-Image Generation and Editing Method for Traffic Scenes
Feng Lv
Haoxuan Feng
Zilu Zhang
Chunlong Xia
Yanfeng Li
DiffM
310
0
0
17 Nov 2025
ActVAR: Activating Mixtures of Weights and Tokens for Efficient Visual Autoregressive Generation
Kaixin Zhang
Ruiqing Yang
Yuan Zhang
Shan You
Tao Huang
VLM
138
0
0
17 Nov 2025
Recurrent Autoregressive Diffusion: Global Memory Meets Local Attention
Taiye Chen
Zihan Ding
Anjian Li
Christina Zhang
Zeqi Xiao
Yisen Wang
Chi Jin
VGen
173
2
0
17 Nov 2025
GenTract: Generative Global Tractography
Alec Sargood
Lemuel Puglisi
Elinor Thompson
Mirco Musolesi
Daniel C. Alexander
MedIm
238
0
0
17 Nov 2025
Distribution Matching Distillation Meets Reinforcement Learning
Dengyang Jiang
Dongyang Liu
Zanyi Wang
Qilong Wu
Liuzhuozheng Li
...
Bo Zhang
Mengmeng Wang
Steven Hoi
Peng Gao
H. Yang
426
2
0
17 Nov 2025
MeanFlow Transformers with Representation Autoencoders
Zheyuan Hu
Chieh-Hsin Lai
Ge Wu
Yuki Mitsufuji
Stefano Ermon
226
1
0
17 Nov 2025
Towards High-Consistency Embodied World Model with Multi-View Trajectory Videos
Taiyi Su
Jian Zhu
Yaxuan Li
Chong Ma
Zitai Huang
Yichen Zhu
Hanli Wang
VGen
260
0
0
17 Nov 2025
Training-Free Multi-View Extension of IC-Light for Textual Position-Aware Scene Relighting
Jiangnan Ye
Jiedong Zhuang
Lianrui Mu
Wenjie Zheng
Jiaqi Hu
Xingze Zou
Jing Wang
Haoji Hu
3DGS
185
0
0
17 Nov 2025
Generative Photographic Control for Scene-Consistent Video Cinematic Editing
Huiqiang Sun
Liao Shen
Zhan Peng
Kun Wang
Size Wu
...
Z. Huang
Xingyu Zeng
Zhiguo Cao
Wei Li
Chen Change Loy
DiffM
VGen
177
0
0
17 Nov 2025
DINO-Detect: A Simple yet Effective Framework for Blur-Robust AI-Generated Image Detection
Jialiang Shen
Jiyang Zheng
Yunqi Xue
Huajie Chen
Yu Yao
...
Ruiqi Liu
Helin Gong
Yang Yang
Dadong Wang
Tongliang Liu
243
0
0
16 Nov 2025
TempoMaster: Efficient Long Video Generation via Next-Frame-Rate Prediction
Yukuo Ma
Cong Liu
Junke Wang
J. Liu
Haibin Huang
Zuxuan Wu
C. Zhang
Xuelong Li
VGen
119
1
0
16 Nov 2025
GeoMVD: Geometry-Enhanced Multi-View Generation Model Based on Geometric Information Extraction
Jiaqi Wu
Yaosen Chen
Shuyuan Zhu
VGen
315
0
0
15 Nov 2025
TIMERIPPLE: Accelerating vDiTs by Understanding the Spatio-Temporal Correlations in Latent Space
Wenxuan Miao
Yulin Sun
Aiyue Chen
Jing Lin
Yiwu Yao
Yiming Gan
Jieru Zhao
Jingwen Leng
Mingyi Guo
Yu Feng
199
0
0
15 Nov 2025
ProAV-DiT: A Projected Latent Diffusion Transformer for Efficient Synchronized Audio-Video Generation
Jiahui Sun
Weining Wang
Mingzhen Sun
Y. Yang
Xinxin Zhu
Jing Liu
DiffM
VGen
195
0
0
15 Nov 2025
Adaptive Begin-of-Video Tokens for Autoregressive Video Diffusion Models
Tianle Cheng
Zeyan Zhang
Kaifeng Gao
Jun Xiao
DiffM
VGen
260
0
0
15 Nov 2025
Mixture of States: Routing Token-Level Dynamics for Multimodal Generation
Haozhe Liu
Ding Liu
Mingchen Zhuge
Zijian Zhou
Tian Xie
...
Juan-Manuel Perez-Rua
Tao Xiang
Wei Liu
Shikun Liu
Jürgen Schmidhuber
105
0
0
15 Nov 2025
A Best-of-Both-Worlds Proof for Tsallis-INF without Fenchel Conjugates
Wei-Cheng Lee
Francesco Orabona
126
19
0
14 Nov 2025
Depth Anything 3: Recovering the Visual Space from Any Views
Haotong Lin
Sili Chen
Junhao Liew
Donny Y. Chen
Z. Li
Guang Shi
Jiashi Feng
Bingyi Kang
3DV
VLM
MDE
713
17
0
13 Nov 2025
nuPlan-R: A Closed-Loop Planning Benchmark for Autonomous Driving via Reactive Multi-Agent Simulation
Mingxing Peng
Ruoyu Yao
Xusen Guo
Jun Ma
308
0
0
13 Nov 2025
Generative AI Meets 6G and Beyond: Diffusion Models for Semantic Communications
Hai-Long Qin
Jincheng Dai
Guo Lu
Shuo Shao
Sixian Wang
Tongda Xu
Wenjun Zhang
Ping Zhang
Khaled B. Letaief
DiffM
VLM
429
0
0
11 Nov 2025
oboro: Text-to-Image Synthesis on Limited Data using Flow-based Diffusion Transformer with MMH Attention
Ryusuke Mizutani
Kazuaki Matano
Tsugumi Kadowaki
Haruki Tenya
Layris
nuigurumi
Koki Hashimoto
Yu Tanaka
169
0
0
11 Nov 2025
Beyond Randomness: Understand the Order of the Noise in Diffusion
Song Yan
Min Li
Bi Xinliang
J. Yang
Yusen Zhang
Guanye Xiong
Yunwei Lan
Tao Zhang
Wei Zhai
Zheng-jun Zha
DiffM
322
0
0
11 Nov 2025
Simulating the Visual World with Artificial Intelligence: A Roadmap
Jingtong Yue
Z. Huang
Z. Chen
Xintao Wang
Pengfei Wan
Ziwei Liu
VGen
LM&Ro
489
1
0
11 Nov 2025
E2E-VGuard: Adversarial Prevention for Production LLM-based End-To-End Speech Synthesis
Zhisheng Zhang
Derui Wang
Yifan Mi
Zhiyong Wu
Jie Gao
Yuxin Cao
Kai Ye
Minhui Xue
Jie Hao
AAML
189
0
0
10 Nov 2025
GNN-Enabled Robust Hybrid Beamforming with Score-Based CSI Generation and Denoising
Yuhang Li
Yang Lu
B. Ai
Z. Ding
Dusit Niyato
A. Nallanathan
DiffM
60
1
0
10 Nov 2025
RelightMaster: Precise Video Relighting with Multi-plane Light Images
Weikang Bian
Xiaoyu Shi
Z. Huang
J. Bai
Qinghe Wang
Xintao Wang
Pengfei Wan
Kun Gai
Jiaming Song
VGen
230
2
0
09 Nov 2025
Latent Refinement via Flow Matching for Training-free Linear Inverse Problem Solving
Hossein Askari
Yadan Luo
Hongfu Sun
Fred Roosta
187
0
0
08 Nov 2025
Neodragon: Mobile Video Generation using Diffusion Transformer
Animesh Karnewar
Denis Korzhenkov
Ioannis Lelekas
Adil Karjauv
Noor Fathima
...
Rafael Esteves
Tushar Singhal
Fatih Porikli
Mohsen Ghafoorian
A. Habibian
DiffM
VGen
160
2
0
08 Nov 2025
Enhancing Diffusion Model Guidance through Calibration and Regularization
Seyed Alireza Javid
Amirhossein Bagheri
Nuria González-Prelcic
194
0
0
08 Nov 2025
FreeControl: Efficient, Training-Free Structural Control via One-Step Attention Extraction
Jiang Lin
Xinyu Chen
Song Wu
Zhiqiu Zhang
Jizhi Zhang
Ye Wang
Qiang Tang
Qian Wang
Jian Yang
Zili Yi
DiffM
134
0
0
07 Nov 2025
Rethinking Metrics and Diffusion Architecture for 3D Point Cloud Generation
Matteo Bastico
David Ryckelynck
Laurent Corté
Yannick Tillier
Etienne Decencière
328
0
0
07 Nov 2025
VLM-driven Skill Selection for Robotic Assembly Tasks
Jeong-Jung Kim
Doo-Yeol Koh
Chang-Hyun Kim
92
0
0
07 Nov 2025
TwinVLA: Data-Efficient Bimanual Manipulation with Twin Single-Arm Vision-Language-Action Models
Hokyun Im
Euijin Jeong
Jianlong Fu
Andrey Kolobov
Youngwoon Lee
84
0
0
07 Nov 2025
On Flow Matching KL Divergence
Maojiang Su
Jerry Yao-Chieh Hu
Sophia Pi
Han Liu
342
0
0
07 Nov 2025
MusRec: Zero-Shot Text-to-Music Editing via Rectified Flow and Diffusion Transformers
Ali Boudaghi
Hadi Zare
348
0
0
06 Nov 2025
InfinityStar: Unified Spacetime AutoRegressive Modeling for Visual Generation
Jinlai Liu
J. N. Han
B. Yan
Hui Wu
Fengda Zhu
Xing-Hui Wang
Yi Jiang
Bingyue Peng
Zehuan Yuan
VGen
271
5
0
06 Nov 2025
Evo-1: Lightweight Vision-Language-Action Model with Preserved Semantic Alignment
Tao Lin
Yilei Zhong
Yuxin Du
Jingjing Zhang
Jiting Liu
...
Yanwen Zou
Lixing Zou
Zhaoye Zhou
Gen Li
Bo Zhao
VLM
169
4
0
06 Nov 2025
Diffusion-SDPO: Safeguarded Direct Preference Optimization for Diffusion Models
Minghao Fu
Guo-Hua Wang
Tianyu Cui
Qing-Guo Chen
Zhao Xu
Weihua Luo
Kaifu Zhang
269
2
0
05 Nov 2025
UniAVGen: Unified Audio and Video Generation with Asymmetric Cross-Modal Interactions
Guozhen Zhang
Z. Zhou
Teng Hu
Ziqiao Peng
Y. Zhang
Yihao Chen
Yuan Zhou
Qinglin Lu
Limin Wang
DiffM
VGen
242
5
0
05 Nov 2025
Scalable Single-Cell Gene Expression Generation with Latent Diffusion Models
Giovanni Palla
Sudarshan Babu
Payam Dibaeinia
James D. Pearce
Donghui Li
Aly A. Khan
Theofanis Karaletsos
Jakub M. Tomczak
179
1
0
04 Nov 2025
Towards One-step Causal Video Generation via Adversarial Self-Distillation
Yongqi Yang
Huayang Huang
Xu Peng
Xiaobin Hu
Donghao Luo
Jiangning Zhang
Chengjie Wang
Yu Wu
DiffM
VGen
206
3
0
03 Nov 2025
Diffusion Transformer meets Multi-level Wavelet Spectrum for Single Image Super-Resolution
Peng Du
Hui Li
Han Xu
Paul Barom Jeon
Dongwook Lee
Daehyun Ji
Ran Yang
Feng Zhu
412
0
0
03 Nov 2025
Lightweight Learning from Actuation-Space Demonstrations via Flow Matching for Whole-Body Soft Robotic Grasping
Liudi Yang
Yang Bai
Yuhao Wang
Ibrahim Alsarraj
Gitta Kutyniok
Z. Wang
Ke Wu
160
0
0
03 Nov 2025
Fractional Diffusion Bridge Models
Gabriel Nobis
Maximilian Springenberg
Arina Belova
Rembert Daems
Christoph Knochenhauer
Manfred Opper
Tolga Birdal
Wojciech Samek
DiffM
163
0
0
03 Nov 2025
Previous
1
2
3
4
5
...
53
54
55
Next
Page 4 of 55
Page
of 55
Go