ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2212.09748
  4. Cited By
Scalable Diffusion Models with Transformers
v1v2 (latest)

Scalable Diffusion Models with Transformers

IEEE International Conference on Computer Vision (ICCV), 2022
19 December 2022
William S. Peebles
Saining Xie
    GNN
ArXiv (abs)PDFHTMLHuggingFace (18 upvotes)

Papers citing "Scalable Diffusion Models with Transformers"

50 / 2,712 papers shown
FlexiQ: Adaptive Mixed-Precision Quantization for Latency/Accuracy Trade-Offs in Deep Neural Networks
FlexiQ: Adaptive Mixed-Precision Quantization for Latency/Accuracy Trade-Offs in Deep Neural Networks
Jaemin Kim
Hongjun Um
Sungkyun Kim
Yongjun Park
Jiwon Seo
MQ
146
0
0
03 Oct 2025
SALSA-V: Shortcut-Augmented Long-form Synchronized Audio from Videos
SALSA-V: Shortcut-Augmented Long-form Synchronized Audio from Videos
Amir Dellali
Luca A. Lanzendörfer
Florian Grötschla
Roger Wattenhofer
VGen
122
0
0
03 Oct 2025
Paris: A Decentralized Trained Open-Weight Diffusion Model
Paris: A Decentralized Trained Open-Weight Diffusion Model
Zhiying Jiang
Raihan Seraj
Marcos Villagra
Bidhan Roy
MoE
88
0
0
03 Oct 2025
Taming Text-to-Sounding Video Generation via Advanced Modality Condition and Interaction
Taming Text-to-Sounding Video Generation via Advanced Modality Condition and Interaction
Kaisi Guan
Xihua Wang
Zhengfeng Lai
Xin Cheng
Peng Zhang
Xiaojiang Liu
Ruihua Song
Meng Cao
DiffM
267
4
0
03 Oct 2025
Best-of-Majority: Minimax-Optimal Strategy for Pass@$k$ Inference Scaling
Best-of-Majority: Minimax-Optimal Strategy for Pass@kkk Inference Scaling
Qiwei Di
Kaixuan Ji
Xuheng Li
Heyang Zhao
Quanquan Gu
113
1
0
03 Oct 2025
When and Where do Events Switch in Multi-Event Video Generation?
When and Where do Events Switch in Multi-Event Video Generation?
Ruotong Liao
Guowen Huang
Qing Cheng
Thomas Seidl
Daniel Cremers
Volker Tresp
DiffMVGen
213
0
0
03 Oct 2025
Growing Visual Generative Capacity for Pre-Trained MLLMs
Growing Visual Generative Capacity for Pre-Trained MLLMs
Hanyu Wang
Jiaming Han
Ziyan Yang
Qi Zhao
Shanchuan Lin
Xiangyu Yue
Abhinav Shrivastava
Zhenheng Yang
Hao Chen
VLM
204
0
0
02 Oct 2025
Diffusion Transformers for Imputation: Statistical Efficiency and Uncertainty Quantification
Diffusion Transformers for Imputation: Statistical Efficiency and Uncertainty Quantification
Zeqi Ye
Minshuo Chen
152
0
0
02 Oct 2025
Equilibrium Matching: Generative Modeling with Implicit Energy-Based Models
Equilibrium Matching: Generative Modeling with Implicit Energy-Based Models
Runqian Wang
Yilun Du
DiffM
896
3
0
02 Oct 2025
Contrastive Representation Regularization for Vision-Language-Action Models
Contrastive Representation Regularization for Vision-Language-Action Models
Taeyoung Kim
J. Lee
Myungkyu Koo
Dongyoung Kim
Kyungmin Lee
Changyeon Kim
Younggyo Seo
Jinwoo Shin
232
1
0
02 Oct 2025
Fine-Grained GRPO for Precise Preference Alignment in Flow Models
Fine-Grained GRPO for Precise Preference Alignment in Flow Models
Yujie Zhou
Pengyang Ling
Jiazi Bu
Yibin Wang
Yuhang Zang
Jiaqi Wang
Li Niu
Guangtao Zhai
DiffM
234
3
0
02 Oct 2025
Optimal Control Meets Flow Matching: A Principled Route to Multi-Subject Fidelity
Optimal Control Meets Flow Matching: A Principled Route to Multi-Subject Fidelity
Eric Tillmann Bill
Enis Simsar
Thomas Hofmann
DiffM
353
0
0
02 Oct 2025
Input-Aware Sparse Attention for Real-Time Co-Speech Video Generation
Input-Aware Sparse Attention for Real-Time Co-Speech Video Generation
Beijia Lu
Ziyi Chen
Jing Xiao
Jun-Yan Zhu
DiffMVGen
327
0
0
02 Oct 2025
Zero-shot Human Pose Estimation using Diffusion-based Inverse solvers
Zero-shot Human Pose Estimation using Diffusion-based Inverse solvers
Sahil Bhandary Karnoor
Romit Roy Choudhury
DiffM
150
0
0
02 Oct 2025
FreeViS: Training-free Video Stylization with Inconsistent References
FreeViS: Training-free Video Stylization with Inconsistent References
Jiacong Xu
Yiqun Mei
Ke Zhang
Vishal M. Patel
DiffMVGen
208
2
0
02 Oct 2025
Learning to Generate Rigid Body Interactions with Video Diffusion Models
Learning to Generate Rigid Body Interactions with Video Diffusion Models
David Romero
Ariana Bermúdez
Hao Li
Fabio Pizzati
Ivan Laptev
DiffMVGen
458
0
0
02 Oct 2025
Pack and Force Your Memory: Long-form and Consistent Video Generation
Pack and Force Your Memory: Long-form and Consistent Video Generation
Xiaofei Wu
Guozhen Zhang
Zhiyong Xu
Yuan Zhou
Qinglin Lu
Xuming He
VGen
244
3
0
02 Oct 2025
DragFlow: Unleashing DiT Priors with Region Based Supervision for Drag Editing
DragFlow: Unleashing DiT Priors with Region Based Supervision for Drag Editing
Zihan Zhou
Shilin Lu
Shuli Leng
Shaocong Zhang
Zhuming Lian
Xinlei Yu
A. Kong
DiffM
313
7
0
02 Oct 2025
Self-Forcing++: Towards Minute-Scale High-Quality Video Generation
Self-Forcing++: Towards Minute-Scale High-Quality Video Generation
Justin Cui
Jie Wu
Ming Li
Tao Yang
Xiaojie Li
Rui Wang
Andrew Bai
Yuanhao Ban
Cho-Jui Hsieh
DiffMVGen
231
32
0
02 Oct 2025
NoiseShift: Resolution-Aware Noise Recalibration for Better Low-Resolution Image Generation
NoiseShift: Resolution-Aware Noise Recalibration for Better Low-Resolution Image Generation
Ruozhen He
Moayed Haji-Ali
Ziyan Yang
Vicente Ordonez
DiffM
155
0
0
02 Oct 2025
InfVSR: Breaking Length Limits of Generic Video Super-Resolution
InfVSR: Breaking Length Limits of Generic Video Super-Resolution
Ziqing Zhang
Kai Liu
Zheng Chen
X. Li
Yihao Chen
Bingnan Duan
Linghe Kong
Yulun Zhang
165
2
0
01 Oct 2025
Image Generation Based on Image Style Extraction
Image Generation Based on Image Style Extraction
Shuochen Chang
128
0
0
01 Oct 2025
LVTINO: LAtent Video consisTency INverse sOlver for High Definition Video Restoration
LVTINO: LAtent Video consisTency INverse sOlver for High Definition Video Restoration
Alessio Spagnoletti
Andrés Almansa
Marcelo Pereyra
DiffMVGen
189
0
0
01 Oct 2025
Fine-Tuning Masked Diffusion for Provable Self-Correction
Fine-Tuning Masked Diffusion for Provable Self-Correction
Jaeyeon Kim
Seunggeun Kim
Taekyun Lee
David Z. Pan
Hyeji Kim
Sham Kakade
Sitan Chen
DiffM
304
2
0
01 Oct 2025
Cascaded Diffusion Framework for Probabilistic Coarse-to-Fine Hand Pose Estimation
Cascaded Diffusion Framework for Probabilistic Coarse-to-Fine Hand Pose Estimation
Taeyun Woo
Jinah Park
Tae-Kyun Kim
DiffM
149
0
0
01 Oct 2025
Learn to Guide Your Diffusion Model
Learn to Guide Your Diffusion Model
Alexandre Galashov
Ashwini Pokle
Arnaud Doucet
Arthur Gretton
Mauricio Delbracio
Valentin De Bortoli
DiffM
448
0
0
01 Oct 2025
Purrception: Variational Flow Matching for Vector-Quantized Image Generation
Purrception: Variational Flow Matching for Vector-Quantized Image Generation
Răzvan-Andrei Matişan
Vincent Tao Hu
Grigory Bartosh
Bjorn Ommer
Cees G. M. Snoek
Max Welling
Jan-Willem van de Meent
Mohammad Mahdi Derakhshani
Floor Eijkelboom
147
1
0
01 Oct 2025
IMAGEdit: Let Any Subject Transform
IMAGEdit: Let Any Subject Transform
Fei Shen
Weihao Xu
Rui Yan
Dong Zhang
Xiangbo Shu
Jinhui Tang
VGen
122
1
0
01 Oct 2025
Arbitrary Generative Video Interpolation
Arbitrary Generative Video Interpolation
Guozhen Zhang
Haiguang Wang
C. Wang
Yuan Zhou
Qinglin Lu
Limin Wang
VGen
157
0
0
01 Oct 2025
Continuously Augmented Discrete Diffusion model for Categorical Generative Modeling
Continuously Augmented Discrete Diffusion model for Categorical Generative Modeling
Huangjie Zheng
Shansan Gong
Ruixiang Zhang
Tianrong Chen
Jiatao Gu
Mingyuan Zhou
Navdeep Jaitly
Y. Zhang
DiffM
280
6
0
01 Oct 2025
Syntax-Guided Diffusion Language Models with User-Integrated Personalization
Syntax-Guided Diffusion Language Models with User-Integrated Personalization
Ruqian Zhang
Yijiao Zhang
Juan Shen
Zhongyi Zhu
Annie Qu
DiffM
130
0
0
01 Oct 2025
BindWeave: Subject-Consistent Video Generation via Cross-Modal Integration
BindWeave: Subject-Consistent Video Generation via Cross-Modal Integration
Zhaoyang Li
Dongjun Qian
Kai Su
Qishuai Diao
Xiangyang Xia
Chang Liu
Wenfei Yang
Tianzhu Zhang
Zehuan Yuan
DiffMVGen
136
3
0
01 Oct 2025
Selective Underfitting in Diffusion Models
Selective Underfitting in Diffusion Models
Kiwhan Song
Jaeyeon Kim
Sitan Chen
Yilun Du
Sham Kakade
Vincent Sitzmann
DiffM
144
5
0
01 Oct 2025
VLA-RFT: Vision-Language-Action Reinforcement Fine-tuning with Verified Rewards in World Simulators
VLA-RFT: Vision-Language-Action Reinforcement Fine-tuning with Verified Rewards in World Simulators
Hengtao Li
Pengxiang Ding
Runze Suo
Yihao Wang
Zirui Ge
...
Kexian Yu
Mingyang Sun
Hongyin Zhang
Donglin Wang
Weihua Su
149
9
0
01 Oct 2025
PatchVSR: Breaking Video Diffusion Resolution Limits with Patch-wise Video Super-Resolution
PatchVSR: Breaking Video Diffusion Resolution Limits with Patch-wise Video Super-ResolutionComputer Vision and Pattern Recognition (CVPR), 2025
S. Du
Menghan Xia
Chang Liu
Xintao Wang
Jing Wang
Pengfei Wan
Di Zhang
Xiangyang Ji
DiffMSupRVGen
299
3
0
30 Sep 2025
Query-Kontext: An Unified Multimodal Model for Image Generation and Editing
Query-Kontext: An Unified Multimodal Model for Image Generation and Editing
Yuxin Song
Wenkai Dong
Shizun Wang
Qi Zhang
Song Xue
...
H. Yang
Haocheng Feng
Hang Zhou
Xinyan Xiao
Jingdong Wang
DiffMMLLM
153
5
0
30 Sep 2025
LieHMR: Autoregressive Human Mesh Recovery with $SO(3)$ Diffusion
LieHMR: Autoregressive Human Mesh Recovery with SO(3)SO(3)SO(3) Diffusion
Donghwan Kim
Tae-Kyun Kim
DiffM
212
0
0
30 Sep 2025
Refine Drugs, Don't Complete Them: Uniform-Source Discrete Flows for Fragment-Based Drug Discovery
Refine Drugs, Don't Complete Them: Uniform-Source Discrete Flows for Fragment-Based Drug Discovery
Benno Kaech
Luis Wyss
Karsten Borgwardt
Gianvito Grasso
95
0
0
30 Sep 2025
Post-Training Quantization for Audio Diffusion Transformers
Post-Training Quantization for Audio Diffusion Transformers
Tanmay Khandelwal
Magdalena Fuentes
MQ
117
0
0
30 Sep 2025
OmniNav: A Unified Framework for Prospective Exploration and Visual-Language Navigation
OmniNav: A Unified Framework for Prospective Exploration and Visual-Language Navigation
Xinda Xue
Junjun Hu
Minghua Luo
Xie Shichao
Jintao Chen
Zixun Xie
Quan Kuichen
Guo Wei
Mu Xu
Zedong Chu
298
8
0
30 Sep 2025
AReUReDi: Annealed Rectified Updates for Refining Discrete Flows with Multi-Objective Guidance
AReUReDi: Annealed Rectified Updates for Refining Discrete Flows with Multi-Objective Guidance
Tong Chen
Yinuo Zhang
Pranam Chatterjee
170
3
0
30 Sep 2025
LTA-L2S: Lexical Tone-Aware Lip-to-Speech Synthesis for Mandarin with Cross-Lingual Transfer Learning
LTA-L2S: Lexical Tone-Aware Lip-to-Speech Synthesis for Mandarin with Cross-Lingual Transfer Learning
Kang Yang
Yifan Liang
Fangkun Liu
Zhenping Xie
C. Zheng
104
0
0
30 Sep 2025
DA$^{2}$: Depth Anything in Any Direction
DA2^{2}2: Depth Anything in Any Direction
Haodong Li
Wangguangdong Zheng
Jing He
Yuhao Liu
Xin Lin
Xin Yang
Ying-Cong Chen
Chunchao Guo
MDE
479
4
0
30 Sep 2025
AdaBlock-dLLM: Semantic-Aware Diffusion LLM Inference via Adaptive Block Size
AdaBlock-dLLM: Semantic-Aware Diffusion LLM Inference via Adaptive Block Size
Guanxi Lu
Hao Mark Chen
Yuto Karashima
Zhican Wang
Daichi Fujiki
Hongxiang Fan
AI4CE
123
4
0
30 Sep 2025
Flow Autoencoders are Effective Protein Tokenizers
Flow Autoencoders are Effective Protein Tokenizers
Rohit Dilip
Evan Zhang
Ayush Varshney
David Van Valen
DiffM
125
0
0
30 Sep 2025
MotionRAG: Motion Retrieval-Augmented Image-to-Video Generation
MotionRAG: Motion Retrieval-Augmented Image-to-Video Generation
Chenhui Zhu
Yilu Wu
Shuai Wang
Gangshan Wu
Limin Wang
DiffMVGen
125
1
0
30 Sep 2025
Stitch: Training-Free Position Control in Multimodal Diffusion Transformers
Stitch: Training-Free Position Control in Multimodal Diffusion Transformers
Jessica Bader
Mateusz Pach
Maria A. Bravo
Serge Belongie
Zeynep Akata
155
1
0
30 Sep 2025
Video Object Segmentation-Aware Audio Generation
Video Object Segmentation-Aware Audio Generation
Ilpo Viertola
Vladimir E. Iashin
Esa Rahtu
DiffMVOSVGen
184
1
0
30 Sep 2025
Post-Training Quantization via Residual Truncation and Zero Suppression for Diffusion Models
Post-Training Quantization via Residual Truncation and Zero Suppression for Diffusion Models
Donghoon Kim
Dongyoung Lee
Ik Joon Chang
Sung-Ho Bae
MQ
152
0
0
30 Sep 2025
Ovi: Twin Backbone Cross-Modal Fusion for Audio-Video Generation
Ovi: Twin Backbone Cross-Modal Fusion for Audio-Video Generation
Chetwin Low
Weimin Wang
Calder Katyal
DiffMVGen
156
10
0
30 Sep 2025
Previous
123...91011...535455
Next
Page 10 of 55
Pageof 55