ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2212.09748
  4. Cited By
Scalable Diffusion Models with Transformers
v1v2 (latest)

Scalable Diffusion Models with Transformers

IEEE International Conference on Computer Vision (ICCV), 2022
19 December 2022
William S. Peebles
Saining Xie
    GNN
ArXiv (abs)PDFHTMLHuggingFace (18 upvotes)

Papers citing "Scalable Diffusion Models with Transformers"

50 / 2,711 papers shown
Flash-DMD: Towards High-Fidelity Few-Step Image Generation with Efficient Distillation and Joint Reinforcement Learning
Flash-DMD: Towards High-Fidelity Few-Step Image Generation with Efficient Distillation and Joint Reinforcement Learning
Guanjie Chen
Shirui Huang
Kai Liu
J. Zhu
Xiaoye Qu
Peng Chen
Yu Cheng
Yifu Sun
200
1
0
25 Nov 2025
A Reason-then-Describe Instruction Interpreter for Controllable Video Generation
A Reason-then-Describe Instruction Interpreter for Controllable Video Generation
Shengqiong Wu
Weicai Ye
Y. Zhang
Jiahao Wang
Quande Liu
Xintao Wang
Pengfei Wan
Kun Gai
Hao Fei
Tat-Seng Chua
VGenLRM
184
0
0
25 Nov 2025
DINO-Tok: Adapting DINO for Visual Tokenizers
DINO-Tok: Adapting DINO for Visual Tokenizers
Mingkai Jia
Mingxiao Li
Liaoyuan Fan
Tianxing Shi
Jiaxin Guo
...
Xiaoyang Guo
Xiao-Xiao Long
Qian Zhang
P. Tan
Wei Yin
ViT
192
0
0
25 Nov 2025
STARFlow-V: End-to-End Video Generative Modeling with Normalizing Flows
STARFlow-V: End-to-End Video Generative Modeling with Normalizing Flows
Jiatao Gu
Ying Shen
Tianrong Chen
Laurent Dinh
Y. Wang
Miguel Angel Bautista
David Berthelot
Josh Susskind
Shuangfei Zhai
DiffMVGen
303
3
0
25 Nov 2025
Inferix: A Block-Diffusion based Next-Generation Inference Engine for World Simulation
Inferix: A Block-Diffusion based Next-Generation Inference Engine for World Simulation
Inferix Team
Tianyu Feng
Yizeng Han
Jiahao He
Yuanyu He
...
Jichao Wu
M. Yang
Yinghao Yu
Zeyu Zhang
Bohan Zhuang
VGenSyDa
321
1
0
25 Nov 2025
A Training-Free Approach for Multi-ID Customization via Attention Adjustment and Spatial Control
A Training-Free Approach for Multi-ID Customization via Attention Adjustment and Spatial Control
Jiawei Lin
Guanlong Jiao
Jianjin Xu
276
0
0
25 Nov 2025
DUO-TOK: Dual-Track Semantic Music Tokenizer for Vocal-Accompaniment Generation
DUO-TOK: Dual-Track Semantic Music Tokenizer for Vocal-Accompaniment Generation
Rui Lin
Zhiyue Wu
Jiahe Le
Kangdi Wang
Weixiong Chen
Junyu Dai
Tao Jiang
168
1
0
25 Nov 2025
UltraViCo: Breaking Extrapolation Limits in Video Diffusion Transformers
UltraViCo: Breaking Extrapolation Limits in Video Diffusion Transformers
Min Zhao
Hongzhou Zhu
Y. Wang
Bokai Yan
J. Zhang
Guande He
Ling Yang
Chongxuan Li
Jun-Jie Zhu
129
0
0
25 Nov 2025
PixelDiT: Pixel Diffusion Transformers for Image Generation
PixelDiT: Pixel Diffusion Transformers for Image Generation
Yongsheng Yu
Wei Xiong
Weili Nie
Yichen Sheng
Shiqiu Liu
Jiebo Luo
268
0
0
25 Nov 2025
Learning Plug-and-play Memory for Guiding Video Diffusion Models
Learning Plug-and-play Memory for Guiding Video Diffusion Models
Selena Song
Ziming Xu
Zijun Zhang
Kun Zhou
Jiaxian Guo
Lianhui Qin
Biwei Huang
VGen
284
0
0
24 Nov 2025
EnfoPath: Energy-Informed Analysis of Generative Trajectories in Flow Matching
EnfoPath: Energy-Informed Analysis of Generative Trajectories in Flow Matching
Ziyun Li
Ben Dai
Huancheng Hu
Henrik Boström
Soon Hoe Lim
VGen
113
1
0
24 Nov 2025
Learning What to Trust: Bayesian Prior-Guided Optimization for Visual Generation
Learning What to Trust: Bayesian Prior-Guided Optimization for Visual Generation
Ruiying Liu
Yuanzhi Liang
Haibin Huang
Tianshu Yu
Chi Zhang
95
0
0
24 Nov 2025
DeCo: Frequency-Decoupled Pixel Diffusion for End-to-End Image Generation
DeCo: Frequency-Decoupled Pixel Diffusion for End-to-End Image Generation
Zehong Ma
Longhui Wei
Shuai Wang
Shiliang Zhang
Qi Tian
DiffM
137
2
0
24 Nov 2025
One4D: Unified 4D Generation and Reconstruction via Decoupled LoRA Control
One4D: Unified 4D Generation and Reconstruction via Decoupled LoRA Control
Zhenxing Mi
Yuxin Wang
Dan Xu
VGen
164
0
0
24 Nov 2025
FVAR: Visual Autoregressive Modeling via Next Focus Prediction
FVAR: Visual Autoregressive Modeling via Next Focus Prediction
Xiaofan Li
Chenming Wu
Yanpeng Sun
Jiaming Zhou
Delin Qu
Yansong Qu
Weihao Bo
Haibao Yu
Dingkang Liang
VGen
158
0
0
24 Nov 2025
PartDiffuser: Part-wise 3D Mesh Generation via Discrete Diffusion
PartDiffuser: Part-wise 3D Mesh Generation via Discrete Diffusion
Yichen Yang
Hong Li
Haodong Zhu
L. Yang
Guojun Lei
Sheng Xu
Baochang Zhang
DiffM
108
0
0
24 Nov 2025
Demystifying Diffusion Objectives: Reweighted Losses are Better Variational Bounds
Demystifying Diffusion Objectives: Reweighted Losses are Better Variational Bounds
Jiaxin Shi
Michalis K. Titsias
DiffM
268
0
0
24 Nov 2025
Cloud4D: Estimating Cloud Properties at a High Spatial and Temporal Resolution
Cloud4D: Estimating Cloud Properties at a High Spatial and Temporal Resolution
Jacob Lin
Edward Gryspeerdt
Ronald Clark
AI4Cl
413
0
0
24 Nov 2025
DiP: Taming Diffusion Models in Pixel Space
DiP: Taming Diffusion Models in Pixel Space
Z. Chen
J. Zhu
Xu Chen
Jiangning Zhang
Xiaobin Hu
Hanzhen Zhao
C. Wang
Jian Yang
Ying Tai
285
0
0
24 Nov 2025
SteadyDancer: Harmonized and Coherent Human Image Animation with First-Frame Preservation
SteadyDancer: Harmonized and Coherent Human Image Animation with First-Frame Preservation
J. Zhang
Shengming Cao
Rui Li
Xiaotong Zhao
Yutao Cui
...
Gangshan Wu
Haolan Chen
Yu-Syuan Xu
L. xilinx Wang
Kai Ma
VGen
279
0
0
24 Nov 2025
LATTICE: Democratize High-Fidelity 3D Generation at Scale
LATTICE: Democratize High-Fidelity 3D Generation at Scale
Zeqiang Lai
Yunfei Zhao
Zibo Zhao
Haolin Liu
Qingxiang Lin
Jingwei Huang
Chunchao Guo
Xiangyu Yue
52
1
0
24 Nov 2025
Eevee: Towards Close-up High-resolution Video-based Virtual Try-on
Eevee: Towards Close-up High-resolution Video-based Virtual Try-on
Jianhao Zeng
Y. Bai
Ruidong Chen
Xuanpu Zhang
Lei-huan Sun
Dongyang Jin
Ryan Xu
Nannan Zhang
Dan Song
Xiangxiang Chu
191
0
0
24 Nov 2025
Edit2Perceive: Image Editing Diffusion Models Are Strong Dense Perceivers
Edit2Perceive: Image Editing Diffusion Models Are Strong Dense Perceivers
Yiqing Shi
Yiren Song
Mike Zheng Shou
DiffMMDE
324
0
0
24 Nov 2025
Terminal Velocity Matching
Terminal Velocity Matching
Linqi Zhou
Mathias Parger
Ayaan Haque
Jiaming Song
70
0
0
24 Nov 2025
Understanding, Accelerating, and Improving MeanFlow Training
Understanding, Accelerating, and Improving MeanFlow Training
J. Kim
Hyojun Go
L. Bogensperger
Julius Erbach
Nikolai Kalischek
Federico Tombari
Konrad Schindler
Dominik Narnhofer
AI4CE
232
0
0
24 Nov 2025
One Attention, One Scale: Phase-Aligned Rotary Positional Embeddings for Mixed-Resolution Diffusion Transformer
One Attention, One Scale: Phase-Aligned Rotary Positional Embeddings for Mixed-Resolution Diffusion Transformer
Haoyu Wu
Jingyi Xu
Qiaomu Miao
Dimitris Samaras
H. Le
92
0
0
24 Nov 2025
View-Consistent Diffusion Representations for 3D-Consistent Video Generation
View-Consistent Diffusion Representations for 3D-Consistent Video Generation
Duolikun Danier
Ge Gao
Steven McDonagh
Changjian Li
Hakan Bilen
Oisin Mac Aodha
DiffMVGen
135
0
0
24 Nov 2025
When Generative Replay Meets Evolving Deepfakes: Domain-Aware Relative Weighting for Incremental Face Forgery Detection
When Generative Replay Meets Evolving Deepfakes: Domain-Aware Relative Weighting for Incremental Face Forgery Detection
Hao Shen
Jikang Cheng
Renye Yan
Zhongyuan Wang
Wei Peng
Baojin Huang
112
0
0
23 Nov 2025
MammothModa2: A Unified AR-Diffusion Framework for Multimodal Understanding and Generation
MammothModa2: A Unified AR-Diffusion Framework for Multimodal Understanding and Generation
Tao Shen
Xin Wan
Taicai Chen
Rui Zhang
Junwen Pan
...
Y. Yang
Chen Cheng
Qi She
Chang Liu
Zhenbang Sun
DiffM
101
0
0
23 Nov 2025
TRIDENT: A Trimodal Cascade Generative Framework for Drug and RNA-Conditioned Cellular Morphology Synthesis
TRIDENT: A Trimodal Cascade Generative Framework for Drug and RNA-Conditioned Cellular Morphology Synthesis
Rui Peng
Ziru Liu
Lingyuan Ye
Yuxing Lu
Boxin Shi
Jinzhuo Wang
89
0
0
23 Nov 2025
Zero-Shot Video Deraining with Video Diffusion Models
Zero-Shot Video Deraining with Video Diffusion Models
Tuomas Varanka
Juan Luis Gonzalez
Hyeongwoo Kim
Pablo Garrido
Xu Yao
DiffMVGen
148
0
0
23 Nov 2025
Pistachio: Towards Synthetic, Balanced, and Long-Form Video Anomaly Benchmarks
Pistachio: Towards Synthetic, Balanced, and Long-Form Video Anomaly Benchmarks
Jie Li
Hongyi Cai
Mingkang Dong
Muxin Pu
Shan You
Fei Wang
Tao Huang
170
0
0
22 Nov 2025
EgoControl: Controllable Egocentric Video Generation via 3D Full-Body Poses
EgoControl: Controllable Egocentric Video Generation via 3D Full-Body Poses
Enrico Pallotta
Sina Mokhtarzadeh Azar
Lars Doorenbos
Serdar Ozsoy
Umar Iqbal
Juergen Gall
DiffMVGen
130
0
0
22 Nov 2025
Plan-X: Instruct Video Generation via Semantic Planning
Plan-X: Instruct Video Generation via Semantic Planning
Lun Huang
You Xie
Hongyi Xu
Tianpei Gu
Chenxu Zhang
Guoxian Song
Zenan Li
Xiaochen Zhao
Linjie Luo
Guillermo Sapiro
DiffMVGen
96
0
0
22 Nov 2025
UltraFlux: Data-Model Co-Design for High-quality Native 4K Text-to-Image Generation across Diverse Aspect Ratios
UltraFlux: Data-Model Co-Design for High-quality Native 4K Text-to-Image Generation across Diverse Aspect Ratios
Tian Ye
Song Fei
Lei Zhu
92
0
0
22 Nov 2025
One-Step Diffusion Transformer for Controllable Real-World Image Super-Resolution
One-Step Diffusion Transformer for Controllable Real-World Image Super-Resolution
Yushun Fang
Yuxiang Chen
S. Yin
Qiang Hu
Jiangchao Yao
Ya Zhang
Xiaoyun Zhang
Y. Wang
330
0
0
21 Nov 2025
UAM: A Unified Attention-Mamba Backbone of Multimodal Framework for Tumor Cell Classification
UAM: A Unified Attention-Mamba Backbone of Multimodal Framework for Tumor Cell Classification
Taixi Chen
Jingyun Chen
Nancy Guo
Mamba
280
0
0
21 Nov 2025
Spanning Tree Autoregressive Visual Generation
Spanning Tree Autoregressive Visual Generation
Sangkyu Lee
Changho Lee
Janghoon Han
Hosung Song
Tackgeun You
Hwasup Lim
Stanley Jungkyu Choi
Honglak Lee
Youngjae Yu
204
0
0
21 Nov 2025
Loomis Painter: Reconstructing the Painting Process
Loomis Painter: Reconstructing the Painting Process
Markus Pobitzer
Chang Liu
Chenyi Zhuang
Teng Long
Bin Ren
Nicu Sebe
DiffM
237
0
0
21 Nov 2025
PostCam: Camera-Controllable Novel-View Video Generation with Query-Shared Cross-Attention
PostCam: Camera-Controllable Novel-View Video Generation with Query-Shared Cross-Attention
Yipeng Chen
Zhichao Ye
Zhenzhou Fang
Xinyu Chen
Xiaoyu Zhang
Jialing Liu
Nan Wang
Haomin Liu
Guofeng Zhang
DiffMVGen
175
1
0
21 Nov 2025
SPIDER: Spatial Image CorresponDence Estimator for Robust Calibration
SPIDER: Spatial Image CorresponDence Estimator for Robust Calibration
Zhimin Shao
Abhay Kumar Yadav
Rama Chellappa
Cheng-Fang Peng
81
0
0
21 Nov 2025
RynnVLA-002: A Unified Vision-Language-Action and World Model
RynnVLA-002: A Unified Vision-Language-Action and World Model
Jun Cen
Siteng Huang
Yuqian Yuan
Kehan Li
Hangjie Yuan
...
Xin Li
Hao Luo
Fan Wang
Deli Zhao
H. Chen
VGenSyDa
324
1
0
21 Nov 2025
Counterfactual World Models via Digital Twin-conditioned Video Diffusion
Counterfactual World Models via Digital Twin-conditioned Video Diffusion
Yiqing Shen
Aiza Maksutova
Chenjia Li
Mathias Unberath
DiffMVGen
165
0
0
21 Nov 2025
Flow and Depth Assisted Video Prediction with Latent Transformer
Eliyas Suleyman
Paul Henderson
Eksan Firkat
Nicolas Pugeault
149
0
0
20 Nov 2025
SAM 3D: 3Dfy Anything in Images
SAM 3D Team
Xingyu Chen
Fu-Jen Chu
Pierre Gleize
Kevin J. Liang
...
Bowen Zhang
Piotr Dollár
Georgia Gkioxari
Matt Feiszli
Jitendra Malik
346
5
0
20 Nov 2025
Decoupling Complexity from Scale in Latent Diffusion Model
Tianxiong Zhong
Xingye Tian
X. Wang
Boyuan Jiang
Xin Tao
Pengfei Wan
DiffM
317
1
0
20 Nov 2025
Mantis: A Versatile Vision-Language-Action Model with Disentangled Visual Foresight
Yi Yang
X. Li
Yiyang Chen
Jin Song
Yihan Wang
Zipeng Xiao
Jiadi Su
You Qiaoben
Pengfei Liu
Zhijie Deng
VLM
207
0
0
20 Nov 2025
TriDiff-4D: Fast 4D Generation through Diffusion-based Triplane Re-posing
Eddie Pokming Sheung
Qihao Liu
Wufei Ma
Prakhar Kaushik
Jianwen Xie
Alan Yuille
135
0
0
20 Nov 2025
NaTex: Seamless Texture Generation as Latent Color Diffusion
Zeqiang Lai
Yunfei Zhao
Zibo Zhao
Xin Yang
Xin Huang
J. Huang
Xiangyu Yue
Chunchao Guo
DiffM
175
0
0
20 Nov 2025
SplitFlux: Learning to Decouple Content and Style from a Single Image
SplitFlux: Learning to Decouple Content and Style from a Single Image
Yitong Yang
Y Samuel Wang
Changshuo Wang
Yongjun Zhang
Ziyang Chen
Shuting He
212
0
0
19 Nov 2025
Previous
123456...535455
Next