ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2212.09748
  4. Cited By
Scalable Diffusion Models with Transformers
v1v2 (latest)

Scalable Diffusion Models with Transformers

IEEE International Conference on Computer Vision (ICCV), 2022
19 December 2022
William S. Peebles
Saining Xie
    GNN
ArXiv (abs)PDFHTMLHuggingFace (18 upvotes)

Papers citing "Scalable Diffusion Models with Transformers"

50 / 2,711 papers shown
BlockVid: Block Diffusion for High-Quality and Consistent Minute-Long Video Generation
BlockVid: Block Diffusion for High-Quality and Consistent Minute-Long Video Generation
Zeyu Zhang
Shuning Chang
Yuanyu He
Yizeng Han
Jiasheng Tang
Fan Wang
Bohan Zhuang
DiffMVGen
187
2
0
28 Nov 2025
Flow Straighter and Faster: Efficient One-Step Generative Modeling via MeanFlow on Rectified Trajectories
Flow Straighter and Faster: Efficient One-Step Generative Modeling via MeanFlow on Rectified Trajectories
Xinxi Zhang
Shiwei Tan
Quang Nguyen
Quan Dao
Ligong Han
Xiaoxiao He
Tunyu Zhang
Alen Mrdovic
Dimitris N. Metaxas
255
1
0
28 Nov 2025
AnyTalker: Scaling Multi-Person Talking Video Generation with Interactivity Refinement
AnyTalker: Scaling Multi-Person Talking Video Generation with Interactivity Refinement
Zhizhou Zhong
Yicheng Ji
Zhe Kong
Y. Liu
Jiarui Wang
...
Ying Qin
Huan Li
Shuiyang Mao
W. Liu
Wenhan Luo
DiffMVGen
116
1
0
28 Nov 2025
VQRAE: Representation Quantization Autoencoders for Multimodal Understanding, Generation and Reconstruction
VQRAE: Representation Quantization Autoencoders for Multimodal Understanding, Generation and Reconstruction
Sinan Du
Jiahao Guo
Bo Li
Shuhao Cui
Zhengzhuo Xu
...
Yongxian Wei
Kun Gai
X. Wang
Kai Wu
C. Yuan
213
0
0
28 Nov 2025
McSc: Motion-Corrective Preference Alignment for Video Generation with Self-Critic Hierarchical Reasoning
McSc: Motion-Corrective Preference Alignment for Video Generation with Self-Critic Hierarchical Reasoning
Q. Yang
Yingjie Chen
Yuan Yao
Yifang Men
Huaizhuo Liu
Miaomiao Cui
EGVMVGen
237
0
0
28 Nov 2025
db-SP: Accelerating Sparse Attention for Visual Generative Models with Dual-Balanced Sequence Parallelism
db-SP: Accelerating Sparse Attention for Visual Generative Models with Dual-Balanced Sequence Parallelism
Siqi Chen
Ke Hong
Tianchen Zhao
Ruiqi Xie
Zhenhua Zhu
X. Zhang
Yu Wang
MoE
109
0
0
28 Nov 2025
DualCamCtrl: Dual-Branch Diffusion Model for Geometry-Aware Camera-Controlled Video Generation
DualCamCtrl: Dual-Branch Diffusion Model for Geometry-Aware Camera-Controlled Video Generation
Hongfei Zhang
Kanghao Chen
Zixin Zhang
Harold Haodong Chen
Yuanhuiyi Lyu
Yuqi Zhang
Shuai Yang
Kun Zhou
Yingcong Chen
DiffMVGen
175
1
0
28 Nov 2025
GOATex: Geometry & Occlusion-Aware Texturing
GOATex: Geometry & Occlusion-Aware Texturing
Hyunjin Kim
Kunho Kim
Adam Lee
Wonkwang Lee
DiffM
101
0
0
28 Nov 2025
Vision Bridge Transformer at Scale
Vision Bridge Transformer at Scale
Zhenxiong Tan
Zeqing Wang
Xingyi Yang
Songhua Liu
Xinchao Wang
DiffM
100
0
0
28 Nov 2025
One-to-All Animation: Alignment-Free Character Animation and Image Pose Transfer
One-to-All Animation: Alignment-Free Character Animation and Image Pose Transfer
S. Shi
Jing Xu
Zhihang Li
Chunli Peng
Xiaoda Yang
Lijing Lu
Kai Hu
Jiangning Zhang
DiffM
119
0
0
28 Nov 2025
DisMo: Disentangled Motion Representations for Open-World Motion Transfer
DisMo: Disentangled Motion Representations for Open-World Motion Transfer
Thomas Ressler-Antal
Frank Fundel
Malek Ben Alaya
S. A. Baumann
Felix Krause
Ming Gui
Bjorn Ommer
DiffMVGen
105
0
0
28 Nov 2025
Scalable Diffusion Transformer for Conditional 4D fMRI Synthesis
Scalable Diffusion Transformer for Conditional 4D fMRI Synthesis
Jungwoo Seo
David K. Park
Shinjae Yoo
Jiook Cha
MedIm
258
0
0
28 Nov 2025
Guiding Visual Autoregressive Models through Spectrum Weakening
Guiding Visual Autoregressive Models through Spectrum Weakening
Chaoyang Wang
Tianmeng Yang
Jingdong Wang
Yunhai Tong
DiffM
168
0
0
28 Nov 2025
InstanceV: Instance-Level Video Generation
InstanceV: Instance-Level Video Generation
Yuheng Chen
Teng Hu
Jiangning Zhang
Zhucun Xue
Ran Yi
Lizhuang Ma
DiffMVGen
120
0
0
28 Nov 2025
ReasonEdit: Towards Reasoning-Enhanced Image Editing Models
ReasonEdit: Towards Reasoning-Enhanced Image Editing Models
Fukun Yin
Shiyu Liu
Yucheng Han
Zhibo Wang
Peng Xing
...
Pengtao Chen
Xiangyu Zhang
Daxin Jiang
Xianfang Zeng
Gang Yu
DiffMKELMLRM
241
0
0
27 Nov 2025
TTSnap: Test-Time Scaling of Diffusion Models via Noise-Aware Pruning
TTSnap: Test-Time Scaling of Diffusion Models via Noise-Aware Pruning
Qingtao Yu
Changlin Song
Minghao Sun
Zhengyang Yu
Vinay Kumar Verma
Soumya Roy
Sumit Negi
Hongdong Li
Dylan Campbell
96
0
0
27 Nov 2025
ITS3D: Inference-Time Scaling for Text-Guided 3D Diffusion Models
ITS3D: Inference-Time Scaling for Text-Guided 3D Diffusion Models
Zhenglin Zhou
Fan Ma
Xiaobo Xia
Hehe Fan
Yi Yang
Tat-Seng Chua
DiffM3DGS
121
0
0
27 Nov 2025
Generative Anchored Fields: Controlled Data Generation via Emergent Velocity Fields and Transport Algebra
Generative Anchored Fields: Controlled Data Generation via Emergent Velocity Fields and Transport Algebra
Deressa Wodajo Deressa
Hannes Mareen
Peter Lambert
Glenn Van Wallendael
64
0
0
27 Nov 2025
StreamFlow: Theory, Algorithm, and Implementation for High-Efficiency Rectified Flow Generation
StreamFlow: Theory, Algorithm, and Implementation for High-Efficiency Rectified Flow Generation
Sen Fang
Hongbin Zhong
Yalin Feng
Dimitris N. Metaxas
Dimitris N. Metaxas
154
1
0
27 Nov 2025
Adversarial Flow Models
Adversarial Flow Models
Shanchuan Lin
Ceyuan Yang
Zhijie Lin
Hao Chen
Haoqi Fan
GAN
145
0
0
27 Nov 2025
IMTalker: Efficient Audio-driven Talking Face Generation with Implicit Motion Transfer
IMTalker: Efficient Audio-driven Talking Face Generation with Implicit Motion Transfer
Bo Chen
Tao Liu
Qi Chen
Xie Chen
Zilong Zheng
VGen
92
0
0
27 Nov 2025
Toward Diffusible High-Dimensional Latent Spaces: A Frequency Perspective
Toward Diffusible High-Dimensional Latent Spaces: A Frequency Perspective
Bolin Lai
Xudong Wang
Saketh Rambhatla
James M. Rehg
Zsolt Kira
Rohit Girdhar
Ishan Misra
DiffM
133
0
0
27 Nov 2025
Which Layer Causes Distribution Deviation? Entropy-Guided Adaptive Pruning for Diffusion and Flow Models
Which Layer Causes Distribution Deviation? Entropy-Guided Adaptive Pruning for Diffusion and Flow Models
Changlin Li
Jiawei Zhang
Z. Shi
Zongxin Yang
Zhihui Li
Xiaojun Chang
DiffMVLM
261
0
0
26 Nov 2025
SocialNav: Training Human-Inspired Foundation Model for Socially-Aware Embodied Navigation
SocialNav: Training Human-Inspired Foundation Model for Socially-Aware Embodied Navigation
Ziyi Chen
Yingnan Guo
Zedong Chu
Minghua Luo
Yanfen Shen
...
Lu Liu
Honglin Han
X. Wu
Mu Xu
Yu Zhang
536
0
0
26 Nov 2025
FaithFusion: Harmonizing Reconstruction and Generation via Pixel-wise Information Gain
FaithFusion: Harmonizing Reconstruction and Generation via Pixel-wise Information Gain
Y. Wang
Xiaofan Li
Chi Huang
Wenhao Zhang
Hao Li
Bosheng Wang
Xun Sun
Jun Wang
DiffM
199
0
0
26 Nov 2025
Deep Parameter Interpolation for Scalar Conditioning
Deep Parameter Interpolation for Scalar Conditioning
Chicago Y. Park
Michael T. McCann
Cristina Garcia-Cardona
B. Wohlberg
Ulugbek S. Kamilov
AI4CE
277
0
0
26 Nov 2025
MobileI2V: Fast and High-Resolution Image-to-Video on Mobile Devices
MobileI2V: Fast and High-Resolution Image-to-Video on Mobile Devices
Shuai Zhang
Bao Tang
Siyuan Yu
Yueting Zhu
Jingfeng Yao
Ya Zou
Shanglin Yuan
Li Yu
Wenyu Liu
Xinggang Wang
DiffMVGen
204
0
0
26 Nov 2025
Saddle-Free Guidance: Improved On-Manifold Sampling without Labels or Additional Training
Saddle-Free Guidance: Improved On-Manifold Sampling without Labels or Additional Training
Eric Yeats
Darryl Hannan
Wilson Fearn
T. Doster
Henry Kvinge
Scott Mahan
DiffM
124
0
0
26 Nov 2025
MoGAN: Improving Motion Quality in Video Diffusion via Few-Step Motion Adversarial Post-Training
MoGAN: Improving Motion Quality in Video Diffusion via Few-Step Motion Adversarial Post-Training
Haotian Xue
Qi-An Chen
Zhonghao Wang
Xun Huang
Eli Shechtman
Jinrong Xie
Yongxin Chen
DiffMVGen
529
0
0
26 Nov 2025
Canvas-to-Image: Compositional Image Generation with Multimodal Controls
Canvas-to-Image: Compositional Image Generation with Multimodal Controls
Yusuf Dalva
Guocheng Qian
Maya Goldenberg
Tsai-Shien Chen
Kfir Aberman
Sergey Tulyakov
Pinar Yanardag
Kuan-Chieh Wang
DiffM
199
0
0
26 Nov 2025
Going with the Speed of Sound: Pushing Neural Surrogates into Highly-turbulent Transonic Regimes
Going with the Speed of Sound: Pushing Neural Surrogates into Highly-turbulent Transonic Regimes
Fabian Paischer
Leo Cotteleer
Yann Dreze
Richard Kurle
Dylan Rubini
Maurits Bleeker
Tobias Kronlachner
Johannes Brandstetter
AI4CE
216
1
0
26 Nov 2025
DiverseVAR: Balancing Diversity and Quality of Next-Scale Visual Autoregressive Models
DiverseVAR: Balancing Diversity and Quality of Next-Scale Visual Autoregressive Models
Mingue Park
Prin Phunyaphibarn
Phillip Y. Lee
Minhyuk Sung
112
0
0
26 Nov 2025
3MDiT: Unified Tri-Modal Diffusion Transformer for Text-Driven Synchronized Audio-Video Generation
3MDiT: Unified Tri-Modal Diffusion Transformer for Text-Driven Synchronized Audio-Video Generation
Y. Li
Heyu Si
Federico Landi
Pilar Oplustil Gallegos
Ioannis Koutsoumpas
...
Ruiju Fu
Qi Guo
Xin Jin
Shunyu Liu
Mingli Song
DiffMVGen
192
0
0
26 Nov 2025
Efficient Training for Human Video Generation with Entropy-Guided Prioritized Progressive Learning
Efficient Training for Human Video Generation with Entropy-Guided Prioritized Progressive Learning
Changlin Li
Jiawei Zhang
Shuhao Liu
Sihao Lin
Z. Shi
Zhihui Li
Xiaojun Chang
DiffMVGen
263
0
0
26 Nov 2025
DINO-Tok: Adapting DINO for Visual Tokenizers
DINO-Tok: Adapting DINO for Visual Tokenizers
Mingkai Jia
Mingxiao Li
Liaoyuan Fan
Tianxing Shi
Jiaxin Guo
...
Xiaoyang Guo
Xiao-Xiao Long
Qian Zhang
P. Tan
Wei Yin
ViT
192
0
0
25 Nov 2025
MotionV2V: Editing Motion in a Video
MotionV2V: Editing Motion in a Video
R. Burgert
Charles Herrmann
Forrester Cole
Michael S. Ryoo
Neal Wadhwa
Andrey Voynov
Nataniel Ruiz
DiffMVGen
239
0
0
25 Nov 2025
Low-Resolution Editing is All You Need for High-Resolution Editing
Low-Resolution Editing is All You Need for High-Resolution Editing
J. Lee
Hyunsoo Lee
Yong Jae Lee
Bohyung Han
DiffM
222
0
0
25 Nov 2025
Back to the Feature: Explaining Video Classifiers with Video Counterfactual Explanations
Back to the Feature: Explaining Video Classifiers with Video Counterfactual Explanations
Chao Wang
Chengan Che
Xinyue Chen
Sophia Tsoka
Luis C. Garcia-Peraza-Herrera
235
0
0
25 Nov 2025
Rectified SpaAttn: Revisiting Attention Sparsity for Efficient Video Generation
Rectified SpaAttn: Revisiting Attention Sparsity for Efficient Video Generation
Xuewen Liu
Zhikai Li
Jing Zhang
Mengjuan Chen
Qingyi Gu
VGen
137
0
0
25 Nov 2025
Temporal-Visual Semantic Alignment: A Unified Architecture for Transferring Spatial Priors from Vision Models to Zero-Shot Temporal Tasks
Temporal-Visual Semantic Alignment: A Unified Architecture for Transferring Spatial Priors from Vision Models to Zero-Shot Temporal Tasks
Xiangkai Ma
Han Zhang
Wenzhong Li
Sanglu Lu
AI4TSVGen
270
0
0
25 Nov 2025
Inferix: A Block-Diffusion based Next-Generation Inference Engine for World Simulation
Inferix: A Block-Diffusion based Next-Generation Inference Engine for World Simulation
Inferix Team
Tianyu Feng
Yizeng Han
Jiahao He
Yuanyu He
...
Jichao Wu
M. Yang
Yinghao Yu
Zeyu Zhang
Bohan Zhuang
VGenSyDa
322
1
0
25 Nov 2025
PromptMoG: Enhancing Diversity in Long-Prompt Image Generation via Prompt Embedding Mixture-of-Gaussian Sampling
PromptMoG: Enhancing Diversity in Long-Prompt Image Generation via Prompt Embedding Mixture-of-Gaussian Sampling
Bo-Kai Ruan
Teng-Fang Hsiao
Ling Lo
Yi-Lun Wu
Hong-Han Shuai
DiffMVLM
185
0
0
25 Nov 2025
Infinity-RoPE: Action-Controllable Infinite Video Generation Emerges From Autoregressive Self-Rollout
Infinity-RoPE: Action-Controllable Infinite Video Generation Emerges From Autoregressive Self-Rollout
Hidir Yesiltepe
Tuna Han Salih Meral
Adil Kaan Akan
Kaan Oktay
Pinar Yanardag
VGen
217
3
0
25 Nov 2025
A Training-Free Approach for Multi-ID Customization via Attention Adjustment and Spatial Control
A Training-Free Approach for Multi-ID Customization via Attention Adjustment and Spatial Control
Jiawei Lin
Guanlong Jiao
Jianjin Xu
276
0
0
25 Nov 2025
A Reason-then-Describe Instruction Interpreter for Controllable Video Generation
A Reason-then-Describe Instruction Interpreter for Controllable Video Generation
Shengqiong Wu
Weicai Ye
Y. Zhang
Jiahao Wang
Quande Liu
Xintao Wang
Pengfei Wan
Kun Gai
Hao Fei
Tat-Seng Chua
VGenLRM
185
0
0
25 Nov 2025
DUO-TOK: Dual-Track Semantic Music Tokenizer for Vocal-Accompaniment Generation
DUO-TOK: Dual-Track Semantic Music Tokenizer for Vocal-Accompaniment Generation
Rui Lin
Zhiyue Wu
Jiahe Le
Kangdi Wang
Weixiong Chen
Junyu Dai
Tao Jiang
168
1
0
25 Nov 2025
OmniRefiner: Reinforcement-Guided Local Diffusion Refinement
OmniRefiner: Reinforcement-Guided Local Diffusion Refinement
Yaoli Liu
Ziheng Ouyang
Shengtao Lou
Yiren Song
205
0
0
25 Nov 2025
Layer-Aware Video Composition via Split-then-Merge
Layer-Aware Video Composition via Split-then-Merge
Ozgur Kara
Yujia Chen
Ming-Hsuan Yang
James M. Rehg
Wen-Sheng Chu
Du Tran
VGen
172
0
0
25 Nov 2025
STARFlow-V: End-to-End Video Generative Modeling with Normalizing Flows
STARFlow-V: End-to-End Video Generative Modeling with Normalizing Flows
Jiatao Gu
Ying Shen
Tianrong Chen
Laurent Dinh
Y. Wang
Miguel Angel Bautista
David Berthelot
Josh Susskind
Shuangfei Zhai
DiffMVGen
303
3
0
25 Nov 2025
Exo2EgoSyn: Unlocking Foundation Video Generation Models for Exocentric-to-Egocentric Video Synthesis
Exo2EgoSyn: Unlocking Foundation Video Generation Models for Exocentric-to-Egocentric Video Synthesis
Mohammad Mahdi
Yuqian Fu
N. Savov
Jiancheng Pan
Danda Pani Paudel
Luc Van Gool
VGen
215
1
0
25 Nov 2025
Previous
12345...535455
Next