Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales

Terms and Conditions

Twitter GitHub LinkedIn Bluesky Youtube

© 2026 ResearchTrend.AI, All rights reserved.

Home
Papers
2212.09748
Cited By

Scalable Diffusion Models with Transformers

v1v2 (latest)

Scalable Diffusion Models with Transformers

IEEE International Conference on Computer Vision (ICCV), 2022

19 December 2022

William S. Peebles

ArXiv (abs)PDF HTML HuggingFace (18 upvotes)

Papers citing "Scalable Diffusion Models with Transformers"

50 / 2,711 papers shown

Flash-DMD: Towards High-Fidelity Few-Step Image Generation with Efficient Distillation and Joint Reinforcement Learning

Flash-DMD: Towards High-Fidelity Few-Step Image Generation with Efficient Distillation and Joint Reinforcement Learning

200

1

0

25 Nov 2025

A Reason-then-Describe Instruction Interpreter for Controllable Video Generation

A Reason-then-Describe Instruction Interpreter for Controllable Video Generation

184

0

0

25 Nov 2025

DINO-Tok: Adapting DINO for Visual Tokenizers

DINO-Tok: Adapting DINO for Visual Tokenizers

...

192

0

0

25 Nov 2025

STARFlow-V: End-to-End Video Generative Modeling with Normalizing Flows

STARFlow-V: End-to-End Video Generative Modeling with Normalizing Flows

Miguel Angel Bautista

David Berthelot

303

3

0

25 Nov 2025

Inferix: A Block-Diffusion based Next-Generation Inference Engine for World Simulation

Inferix: A Block-Diffusion based Next-Generation Inference Engine for World Simulation

...

321

1

0

25 Nov 2025

A Training-Free Approach for Multi-ID Customization via Attention Adjustment and Spatial Control

A Training-Free Approach for Multi-ID Customization via Attention Adjustment and Spatial Control

276

0

0

25 Nov 2025

DUO-TOK: Dual-Track Semantic Music Tokenizer for Vocal-Accompaniment Generation

DUO-TOK: Dual-Track Semantic Music Tokenizer for Vocal-Accompaniment Generation

168

1

0

25 Nov 2025

UltraViCo: Breaking Extrapolation Limits in Video Diffusion Transformers

UltraViCo: Breaking Extrapolation Limits in Video Diffusion Transformers

129

0

0

25 Nov 2025

PixelDiT: Pixel Diffusion Transformers for Image Generation

PixelDiT: Pixel Diffusion Transformers for Image Generation

268

0

0

25 Nov 2025

Learning Plug-and-play Memory for Guiding Video Diffusion Models

Learning Plug-and-play Memory for Guiding Video Diffusion Models

284

0

0

24 Nov 2025

EnfoPath: Energy-Informed Analysis of Generative Trajectories in Flow Matching

EnfoPath: Energy-Informed Analysis of Generative Trajectories in Flow Matching

Henrik Boström

113

1

0

24 Nov 2025

Learning What to Trust: Bayesian Prior-Guided Optimization for Visual Generation

Learning What to Trust: Bayesian Prior-Guided Optimization for Visual Generation

95

0

0

24 Nov 2025

DeCo: Frequency-Decoupled Pixel Diffusion for End-to-End Image Generation

DeCo: Frequency-Decoupled Pixel Diffusion for End-to-End Image Generation

137

2

0

24 Nov 2025

One4D: Unified 4D Generation and Reconstruction via Decoupled LoRA Control

One4D: Unified 4D Generation and Reconstruction via Decoupled LoRA Control

164

0

0

24 Nov 2025

FVAR: Visual Autoregressive Modeling via Next Focus Prediction

FVAR: Visual Autoregressive Modeling via Next Focus Prediction

158

0

0

24 Nov 2025

PartDiffuser: Part-wise 3D Mesh Generation via Discrete Diffusion

PartDiffuser: Part-wise 3D Mesh Generation via Discrete Diffusion

108

0

0

24 Nov 2025

Demystifying Diffusion Objectives: Reweighted Losses are Better Variational Bounds

Demystifying Diffusion Objectives: Reweighted Losses are Better Variational Bounds

Michalis K. Titsias

268

0

0

24 Nov 2025

Cloud4D: Estimating Cloud Properties at a High Spatial and Temporal Resolution

Cloud4D: Estimating Cloud Properties at a High Spatial and Temporal Resolution

Edward Gryspeerdt

413

0

0

24 Nov 2025

DiP: Taming Diffusion Models in Pixel Space

DiP: Taming Diffusion Models in Pixel Space

Jiangning Zhang

285

0

0

24 Nov 2025

SteadyDancer: Harmonized and Coherent Human Image Animation with First-Frame Preservation

SteadyDancer: Harmonized and Coherent Human Image Animation with First-Frame Preservation

...

279

0

0

24 Nov 2025

LATTICE: Democratize High-Fidelity 3D Generation at Scale

LATTICE: Democratize High-Fidelity 3D Generation at Scale

52

1

0

24 Nov 2025

Eevee: Towards Close-up High-resolution Video-based Virtual Try-on

Eevee: Towards Close-up High-resolution Video-based Virtual Try-on

191

0

0

24 Nov 2025

Edit2Perceive: Image Editing Diffusion Models Are Strong Dense Perceivers

Edit2Perceive: Image Editing Diffusion Models Are Strong Dense Perceivers

Mike Zheng Shou

324

0

0

24 Nov 2025

Terminal Velocity Matching

Terminal Velocity Matching

70

0

0

24 Nov 2025

Understanding, Accelerating, and Improving MeanFlow Training

Understanding, Accelerating, and Improving MeanFlow Training

L. Bogensperger

Nikolai Kalischek

Federico Tombari

Konrad Schindler

Dominik Narnhofer

232

0

0

24 Nov 2025

One Attention, One Scale: Phase-Aligned Rotary Positional Embeddings for Mixed-Resolution Diffusion Transformer

One Attention, One Scale: Phase-Aligned Rotary Positional Embeddings for Mixed-Resolution Diffusion Transformer

Dimitris Samaras

92

0

0

24 Nov 2025

View-Consistent Diffusion Representations for 3D-Consistent Video Generation

View-Consistent Diffusion Representations for 3D-Consistent Video Generation

Duolikun Danier

Steven McDonagh

Oisin Mac Aodha

135

0

0

24 Nov 2025

When Generative Replay Meets Evolving Deepfakes: Domain-Aware Relative Weighting for Incremental Face Forgery Detection

When Generative Replay Meets Evolving Deepfakes: Domain-Aware Relative Weighting for Incremental Face Forgery Detection

112

0

0

23 Nov 2025

MammothModa2: A Unified AR-Diffusion Framework for Multimodal Understanding and Generation

MammothModa2: A Unified AR-Diffusion Framework for Multimodal Understanding and Generation

...

101

0

0

23 Nov 2025

TRIDENT: A Trimodal Cascade Generative Framework for Drug and RNA-Conditioned Cellular Morphology Synthesis

TRIDENT: A Trimodal Cascade Generative Framework for Drug and RNA-Conditioned Cellular Morphology Synthesis

89

0

0

23 Nov 2025

Zero-Shot Video Deraining with Video Diffusion Models

Zero-Shot Video Deraining with Video Diffusion Models

Juan Luis Gonzalez

148

0

0

23 Nov 2025

Pistachio: Towards Synthetic, Balanced, and Long-Form Video Anomaly Benchmarks

Pistachio: Towards Synthetic, Balanced, and Long-Form Video Anomaly Benchmarks

170

0

0

22 Nov 2025

EgoControl: Controllable Egocentric Video Generation via 3D Full-Body Poses

EgoControl: Controllable Egocentric Video Generation via 3D Full-Body Poses

Enrico Pallotta

Sina Mokhtarzadeh Azar

130

0

0

22 Nov 2025

Plan-X: Instruct Video Generation via Semantic Planning

Plan-X: Instruct Video Generation via Semantic Planning

Guillermo Sapiro

96

0

0

22 Nov 2025

UltraFlux: Data-Model Co-Design for High-quality Native 4K Text-to-Image Generation across Diverse Aspect Ratios

UltraFlux: Data-Model Co-Design for High-quality Native 4K Text-to-Image Generation across Diverse Aspect Ratios

92

0

0

22 Nov 2025

One-Step Diffusion Transformer for Controllable Real-World Image Super-Resolution

One-Step Diffusion Transformer for Controllable Real-World Image Super-Resolution

330

0

0

21 Nov 2025

UAM: A Unified Attention-Mamba Backbone of Multimodal Framework for Tumor Cell Classification

UAM: A Unified Attention-Mamba Backbone of Multimodal Framework for Tumor Cell Classification

280

0

0

21 Nov 2025

Spanning Tree Autoregressive Visual Generation

Spanning Tree Autoregressive Visual Generation

Stanley Jungkyu Choi

204

0

0

21 Nov 2025

Loomis Painter: Reconstructing the Painting Process

Loomis Painter: Reconstructing the Painting Process

Markus Pobitzer

237

0

0

21 Nov 2025

PostCam: Camera-Controllable Novel-View Video Generation with Query-Shared Cross-Attention

PostCam: Camera-Controllable Novel-View Video Generation with Query-Shared Cross-Attention

175

1

0

21 Nov 2025

SPIDER: Spatial Image CorresponDence Estimator for Robust Calibration

SPIDER: Spatial Image CorresponDence Estimator for Robust Calibration

Abhay Kumar Yadav

Cheng-Fang Peng

81

0

0

21 Nov 2025

RynnVLA-002: A Unified Vision-Language-Action and World Model

RynnVLA-002: A Unified Vision-Language-Action and World Model

...

324

1

0

21 Nov 2025

Counterfactual World Models via Digital Twin-conditioned Video Diffusion

Counterfactual World Models via Digital Twin-conditioned Video Diffusion

Mathias Unberath

165

0

0

21 Nov 2025

Flow and Depth Assisted Video Prediction with Latent Transformer

Eliyas Suleyman

Nicolas Pugeault

149

0

0

20 Nov 2025

SAM 3D: 3Dfy Anything in Images

...

Georgia Gkioxari

346

5

0

20 Nov 2025

Decoupling Complexity from Scale in Latent Diffusion Model

Tianxiong Zhong

317

1

0

20 Nov 2025

Mantis: A Versatile Vision-Language-Action Model with Disentangled Visual Foresight

207

0

0

20 Nov 2025

TriDiff-4D: Fast 4D Generation through Diffusion-based Triplane Re-posing

Eddie Pokming Sheung

Prakhar Kaushik

135

0

0

20 Nov 2025

NaTex: Seamless Texture Generation as Latent Color Diffusion

175

0

0

20 Nov 2025

SplitFlux: Learning to Decouple Content and Style from a Single Image

SplitFlux: Learning to Decouple Content and Style from a Single Image

212

0

0

19 Nov 2025

1 2 3 4 5 6...53 54 55