ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2212.09748
  4. Cited By
Scalable Diffusion Models with Transformers
v1v2 (latest)

Scalable Diffusion Models with Transformers

IEEE International Conference on Computer Vision (ICCV), 2022
19 December 2022
William S. Peebles
Saining Xie
    GNN
ArXiv (abs)PDFHTMLHuggingFace (18 upvotes)

Papers citing "Scalable Diffusion Models with Transformers"

50 / 2,711 papers shown
Spatial Forcing: Implicit Spatial Representation Alignment for Vision-language-action Model
Spatial Forcing: Implicit Spatial Representation Alignment for Vision-language-action Model
Fuhao Li
Wenxuan Song
Han Zhao
Jingbo Wang
Pengxiang Ding
Donglin Wang
Long Zeng
Haoang Li
207
6
0
14 Oct 2025
EReLiFM: Evidential Reliability-Aware Residual Flow Meta-Learning for Open-Set Domain Generalization under Noisy Labels
EReLiFM: Evidential Reliability-Aware Residual Flow Meta-Learning for Open-Set Domain Generalization under Noisy Labels
Kunyu Peng
Di Wen
Kailun Yang
Jia Fu
Yufan Chen
...
Junwei Zheng
M. Sarfraz
Luc Van Gool
Danda Pani Paudel
Rainer Stiefelhagen
217
0
0
14 Oct 2025
PAINT: Parallel-in-time Neural Twins for Dynamical System Reconstruction
PAINT: Parallel-in-time Neural Twins for Dynamical System Reconstruction
Andreas Radler
Vincent Seyfried
Stefan Pirker
Johannes Brandstetter
Thomas Lichtenegger
138
1
0
14 Oct 2025
SceneAdapt: Scene-aware Adaptation of Human Motion Diffusion
SceneAdapt: Scene-aware Adaptation of Human Motion Diffusion
Jungbin Cho
Minsu Kim
Jisoo Kim
Ce Zheng
László A. Jeni
Ming-Hsuan Yang
Youngjae Yu
Seonjoo Kim
DiffMVGenTTA
257
0
0
14 Oct 2025
BIGFix: Bidirectional Image Generation with Token Fixing
BIGFix: Bidirectional Image Generation with Token Fixing
Victor Besnier
David Hurych
Andrei Bursuc
Eduardo Valle
VGen
149
0
0
14 Oct 2025
ContextGen: Contextual Layout Anchoring for Identity-Consistent Multi-Instance Generation
ContextGen: Contextual Layout Anchoring for Identity-Consistent Multi-Instance Generation
Ruihang Xu
Dewei Zhou
Fan Ma
Yi Yang
DiffM
187
2
0
13 Oct 2025
Massive Activations are the Key to Local Detail Synthesis in Diffusion Transformers
Massive Activations are the Key to Local Detail Synthesis in Diffusion Transformers
Chaofan Gan
Zicheng Zhao
Yuanpeng Tu
Xi Chen
Ziran Qin
Yun Xu
Mehrtash Harandi
W. Lin
161
1
0
13 Oct 2025
WaveletDiff: Multilevel Wavelet Diffusion For Time Series Generation
WaveletDiff: Multilevel Wavelet Diffusion For Time Series Generation
Yu-Hsiang Wang
O. Milenkovic
DiffMAI4TS
343
0
0
13 Oct 2025
MosaicDiff: Training-free Structural Pruning for Diffusion Model Acceleration Reflecting Pretraining Dynamics
MosaicDiff: Training-free Structural Pruning for Diffusion Model Acceleration Reflecting Pretraining Dynamics
Bowei Guo
Shengkun Tang
Cong Zeng
Zhiqiang Shen
155
1
0
13 Oct 2025
Joint Discriminative-Generative Modeling via Dual Adversarial Training
Joint Discriminative-Generative Modeling via Dual Adversarial Training
Xuwang Yin
Claire Zhang
Julie Steele
Nir Shavit
T. T. Wang
GAN
435
0
0
13 Oct 2025
LikePhys: Evaluating Intuitive Physics Understanding in Video Diffusion Models via Likelihood Preference
LikePhys: Evaluating Intuitive Physics Understanding in Video Diffusion Models via Likelihood Preference
Jianhao Yuan
Fabio Pizzati
Francesco Pinto
Lars Kunze
Ivan Laptev
Paul Newman
Philip Torr
D. Martini
DiffMVGen
179
2
0
13 Oct 2025
Vlaser: Vision-Language-Action Model with Synergistic Embodied Reasoning
Vlaser: Vision-Language-Action Model with Synergistic Embodied Reasoning
Ganlin Yang
Tianyi Zhang
Haoran Hao
Weiyun Wang
Y. Liu
...
Jiangmiao Pang
Gen Luo
Wenhai Wang
Yao Mu
Zhi Hou
LM&RoLRM
162
2
0
13 Oct 2025
Flow Matching-Based Autonomous Driving Planning with Advanced Interactive Behavior Modeling
Flow Matching-Based Autonomous Driving Planning with Advanced Interactive Behavior Modeling
Tianyi Tan
Yinan Zheng
Ruiming Liang
Zexu Wang
Kexin Zheng
Jinliang Zheng
Jianxiong Li
Xianyuan Zhan
Jingjing Liu
117
5
0
13 Oct 2025
DiffStyleTS: Diffusion Model for Style Transfer in Time Series
DiffStyleTS: Diffusion Model for Style Transfer in Time Series
Mayank Nagda
Phil Ostheimer
Justus Arweiler
Indra Jungjohann
Jennifer Werner
...
Michael Bortz
Hans Hasse
Stephan Mandt
Marius Kloft
Sophie Fellenz
DiffMAI4TS
108
0
0
13 Oct 2025
Diffusion Transformers with Representation Autoencoders
Diffusion Transformers with Representation Autoencoders
Boyang Zheng
Nanye Ma
Shengbang Tong
Saining Xie
DiffM
206
44
0
13 Oct 2025
DiT360: High-Fidelity Panoramic Image Generation via Hybrid Training
DiT360: High-Fidelity Panoramic Image Generation via Hybrid Training
Haoran Feng
D. Zhang
Xiangtai Li
Bo Du
Lu Qi
137
2
0
13 Oct 2025
Unified Open-World Segmentation with Multi-Modal Prompts
Unified Open-World Segmentation with Multi-Modal Prompts
Yang Liu
Yufei Yin
Chenchen Jing
M. Zhu
Hao Chen
Yuling Xi
Bo Feng
Hao Wang
Shiyu Li
Chunhua Shen
VLM
107
0
0
12 Oct 2025
AdaViewPlanner: Adapting Video Diffusion Models for Viewpoint Planning in 4D Scenes
AdaViewPlanner: Adapting Video Diffusion Models for Viewpoint Planning in 4D Scenes
Yu Li
Menghan Xia
Gongye Liu
J. Bai
Xintao Wang
Conglang Zhang
Yuxuan Lin
Ruihang Chu
Pengfei Wan
Yujiu Yang
VGen
107
1
0
12 Oct 2025
UniFlow: A Unified Pixel Flow Tokenizer for Visual Understanding and Generation
UniFlow: A Unified Pixel Flow Tokenizer for Visual Understanding and Generation
Zhengrong Yue
H. Zhang
Xiangyu Zeng
Boyu Chen
Chenting Wang
...
Lu Dong
Kunpeng Du
Yi Wang
Limin Wang
Yali Wang
190
7
0
12 Oct 2025
Latent Retrieval Augmented Generation of Cross-Domain Protein Binders
Latent Retrieval Augmented Generation of Cross-Domain Protein Binders
Zishen Zhang
Xiangzhe Kong
Wenbing Huang
Yang Liu
DiffM
190
0
0
12 Oct 2025
DEMO: Disentangled Motion Latent Flow Matching for Fine-Grained Controllable Talking Portrait Synthesis
DEMO: Disentangled Motion Latent Flow Matching for Fine-Grained Controllable Talking Portrait Synthesis
Peiyin Chen
Zhuowei Yang
Hui Feng
Sheng Jiang
Rui Yan
DiffMVGen
99
0
0
12 Oct 2025
Head-wise Adaptive Rotary Positional Encoding for Fine-Grained Image Generation
Head-wise Adaptive Rotary Positional Encoding for Fine-Grained Image Generation
Jiaye Li
Baoyou Chen
Hui Li
Zilong Dong
Jingdong Wang
Siyu Zhu
85
0
0
12 Oct 2025
ProteinAE: Protein Diffusion Autoencoders for Structure Encoding
ProteinAE: Protein Diffusion Autoencoders for Structure Encoding
Shaoning Li
Le Zhuo
Yusong Wang
Mingyu Li
Xinheng He
Fandi Wu
Jiaming Song
Pheng-Ann Heng
DiffM
133
0
0
12 Oct 2025
EditCast3D: Single-Frame-Guided 3D Editing with Video Propagation and View Selection
EditCast3D: Single-Frame-Guided 3D Editing with Video Propagation and View Selection
Huaizhi Qu
Ruichen Zhang
Shuqing Luo
Luchao Qi
Zhihao Zhang
Xiaoming Liu
Roni Sengupta
Tianlong Chen
DiffMVGen
139
0
0
11 Oct 2025
Multi-Scale Diffusion Transformer for Jointly Simulating User Mobility and Mobile Traffic Pattern
Multi-Scale Diffusion Transformer for Jointly Simulating User Mobility and Mobile Traffic Pattern
Ziyi Liu
Qingyue Long
Zhiwen Xue
Huandong Wang
Yong Li
79
0
0
11 Oct 2025
SyncLipMAE: Contrastive Masked Pretraining for Audio-Visual Talking-Face Representation
SyncLipMAE: Contrastive Masked Pretraining for Audio-Visual Talking-Face Representation
Zeyu Ling
Xiaodong Gu
Jiangnan Tang
Changqing Zou
CLIP
157
0
0
11 Oct 2025
X-VLA: Soft-Prompted Transformer as Scalable Cross-Embodiment Vision-Language-Action Model
X-VLA: Soft-Prompted Transformer as Scalable Cross-Embodiment Vision-Language-Action Model
Jinliang Zheng
Jianxiong Li
Zhihao Wang
Dongxiu Liu
Xirui Kang
...
Ya-Qin Zhang
Jiangmiao Pang
Jingjing Liu
Tai Wang
Xianyuan Zhan
LM&Ro
239
15
0
11 Oct 2025
SaFiRe: Saccade-Fixation Reiteration with Mamba for Referring Image Segmentation
SaFiRe: Saccade-Fixation Reiteration with Mamba for Referring Image Segmentation
Zhenjie Mao
Yuhuan Yang
Chaofan Ma
Dongsheng Jiang
Jiangchao Yao
Ya Zhang
Yanfeng Wang
124
0
0
11 Oct 2025
SPG: Sandwiched Policy Gradient for Masked Diffusion Language Models
SPG: Sandwiched Policy Gradient for Masked Diffusion Language Models
Chenyu Wang
Paria Rashidinejad
DiJia Su
Song Jiang
S. Wang
...
Shannon Zejiang Shen
Feiyu Chen
Tommi Jaakkola
Yuandong Tian
Bo Liu
OffRL
217
7
0
10 Oct 2025
A PCA-based Data Prediction Method
A PCA-based Data Prediction MethodBaltic Journal of Modern Computing (BJMC), 2025
Peteris Daugulis
Vija Vagale
Emiliano Mancini
Filippo Castiglione
150
4
0
10 Oct 2025
Towards Better & Faster Autoregressive Image Generation: From the Perspective of Entropy
Towards Better & Faster Autoregressive Image Generation: From the Perspective of Entropy
Xiaoxiao Ma
Feng Zhao
Pengyang Ling
Haibo Qiu
Zhixiang Wei
Hu Yu
Jie Huang
Zhixiong Zeng
Lin Ma
176
2
0
10 Oct 2025
DiTSinger: Scaling Singing Voice Synthesis with Diffusion Transformer and Implicit Alignment
DiTSinger: Scaling Singing Voice Synthesis with Diffusion Transformer and Implicit Alignment
Zongcai Du
Guilin Deng
Xiaofeng Guo
Xin Gao
Linke Li
...
Fubo Han
Siyu Yang
Peng Liu
Pan Zhong
Qiang Fu
DiffM
311
1
0
10 Oct 2025
Speculative Jacobi-Denoising Decoding for Accelerating Autoregressive Text-to-image Generation
Speculative Jacobi-Denoising Decoding for Accelerating Autoregressive Text-to-image Generation
Yao Teng
Fuyun Wang
Xian Liu
Z. Chen
Han Shi
Yu Wang
Zhenguo Li
Weiyang Liu
Difan Zou
Xihui Liu
DiffM
134
0
0
10 Oct 2025
HeadsUp! High-Fidelity Portrait Image Super-Resolution
HeadsUp! High-Fidelity Portrait Image Super-Resolution
Renjie Li
Zihao Zhu
X. Wang
Zhengzhong Tu
DiffMSupR
275
0
0
10 Oct 2025
If Probable, Then Acceptable? Understanding Conditional Acceptability Judgments in Large Language Models
If Probable, Then Acceptable? Understanding Conditional Acceptability Judgments in Large Language Models
Jasmin Orth
Philipp Mondorf
Barbara Plank
ELM
264
0
0
09 Oct 2025
MultiCOIN: Multi-Modal COntrollable Video INbetweening
MultiCOIN: Multi-Modal COntrollable Video INbetweening
Maham Tanveer
Yang Zhou
Simon Niklaus
Ali Mahdavi-Amiri
Hao Zhang
Krishna Kumar Singh
Nanxuan Zhao
DiffMVGen
185
1
0
09 Oct 2025
MeanVC: Lightweight and Streaming Zero-Shot Voice Conversion via Mean Flows
MeanVC: Lightweight and Streaming Zero-Shot Voice Conversion via Mean Flows
Guobin Ma
Jixun Yao
Ziqian Ning
Yuepeng Jiang
Lingxin Xiong
Lei Xie
Pengcheng Zhu
DiffMVGen
165
0
0
09 Oct 2025
A Honest Cross-Validation Estimator for Prediction Performance
A Honest Cross-Validation Estimator for Prediction Performance
Tianyu Pan
Vincent Z. Yu
Viswanath Devanarayan
Lu Tian
142
0
0
09 Oct 2025
TTOM: Test-Time Optimization and Memorization for Compositional Video Generation
TTOM: Test-Time Optimization and Memorization for Compositional Video Generation
Leigang Qu
Ziyang Wang
Na Zheng
Wenjie Wang
Liqiang Nie
Tat-Seng Chua
166
1
0
09 Oct 2025
VideoCanvas: Unified Video Completion from Arbitrary Spatiotemporal Patches via In-Context Conditioning
VideoCanvas: Unified Video Completion from Arbitrary Spatiotemporal Patches via In-Context Conditioning
Minghong Cai
Qiulin Wang
Zongli Ye
Wenze Liu
Quande Liu
Weicai Ye
X. Wang
Pengfei Wan
Kun Gai
Xiangyu Yue
VGen
96
1
0
09 Oct 2025
FlowLensing: Simulating Gravitational Lensing with Flow Matching
FlowLensing: Simulating Gravitational Lensing with Flow Matching
Hamees Sayed
Pranath Reddy
Michael W. Toomey
Sergei Gleyzer
200
0
0
09 Oct 2025
Graph Diffusion Transformers are In-Context Molecular Designers
Graph Diffusion Transformers are In-Context Molecular Designers
Gang Liu
Jie Chen
Yihan Zhu
Michael Sun
Tengfei Luo
Nitesh Chawla
Meng Jiang
DiffMAI4CE
98
1
0
09 Oct 2025
CVD-STORM: Cross-View Video Diffusion with Spatial-Temporal Reconstruction Model for Autonomous Driving
CVD-STORM: Cross-View Video Diffusion with Spatial-Temporal Reconstruction Model for Autonomous Driving
Tianrui Zhang
Yichen Liu
Zilin Guo
Yuxin Guo
Jingcheng Ni
Chenjing Ding
Dan Xu
Lewei Lu
Z. Wu
VGen
205
0
0
09 Oct 2025
FreqCa: Accelerating Diffusion Models via Frequency-Aware Caching
FreqCa: Accelerating Diffusion Models via Frequency-Aware Caching
Jiacheng Liu
Peiliang Cai
Qinming Zhou
Yuqi Lin
Deyang Kong
...
Haowen Xu
Chang Zou
J. Tang
S. Zheng
Linfeng Zhang
104
1
0
09 Oct 2025
PAC Learnability in the Presence of Performativity
PAC Learnability in the Presence of Performativity
Ivan Kirev
Lyuben Baltadzhiev
Nikola Konstantinov
134
2
0
09 Oct 2025
UniMMVSR: A Unified Multi-Modal Framework for Cascaded Video Super-Resolution
UniMMVSR: A Unified Multi-Modal Framework for Cascaded Video Super-Resolution
Shian Du
Menghan Xia
Chang-rui Liu
Quande Liu
Xintao Wang
Pengfei Wan
Xiangyang Ji
VGenSupR
275
0
0
09 Oct 2025
FlexTraj: Image-to-Video Generation with Flexible Point Trajectory Control
FlexTraj: Image-to-Video Generation with Flexible Point Trajectory Control
Zhiyuan Zhang
Can Wang
Dongdong Chen
Jing Liao
VGen
245
2
0
09 Oct 2025
A Diffusion Model for Regular Time Series Generation from Irregular Data with Completion and Masking
A Diffusion Model for Regular Time Series Generation from Irregular Data with Completion and Masking
Gal Fadlon
Idan Arbiv
Nimrod Berman
Omri Azencot
DiffMMedIm
161
2
0
08 Oct 2025
scPPDM: A Diffusion Model for Single-Cell Drug-Response Prediction
scPPDM: A Diffusion Model for Single-Cell Drug-Response Prediction
Zhaokang Liang
Shuyang Zhuang
Xiaoran Jiao
Weian Mao
Hao Chen
Chunhua Shen
74
0
0
08 Oct 2025
Revisiting Mixout: An Overlooked Path to Robust Finetuning
Revisiting Mixout: An Overlooked Path to Robust Finetuning
Masih Aminbeidokhti
H. R. Medeiros
Eric Granger
M. Pedersoli
UQCV
243
0
0
08 Oct 2025
Previous
123...789...535455
Next
Page 8 of 55
Pageof 55