Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales

Terms and Conditions

Twitter GitHub LinkedIn Bluesky Youtube

© 2026 ResearchTrend.AI, All rights reserved.

Home
Papers
2212.09748
Cited By

Scalable Diffusion Models with Transformers

v1v2 (latest)

Scalable Diffusion Models with Transformers

IEEE International Conference on Computer Vision (ICCV), 2022

19 December 2022

William S. Peebles

ArXiv (abs)PDF HTML HuggingFace (18 upvotes)

Papers citing "Scalable Diffusion Models with Transformers"

50 / 2,712 papers shown

Masked Auto-Regressive Variational Acceleration: Fast Inference Makes Practical Reinforcement Learning

Masked Auto-Regressive Variational Acceleration: Fast Inference Makes Practical Reinforcement Learning

257

0

0

19 Nov 2025

First Frame Is the Place to Go for Video Content Customization

First Frame Is the Place to Go for Video Content Customization

Cornelia Fermüller

Brandon Yushan Feng

Yiannis Aloimonos

207

0

0

19 Nov 2025

BD-Net: Has Depth-Wise Convolution Ever Been Applied in Binary Neural Networks?

BD-Net: Has Depth-Wise Convolution Ever Been Applied in Binary Neural Networks?

162

3

0

19 Nov 2025

Insert In Style: A Zero-Shot Generative Framework for Harmonious Cross-Domain Object Composition

Insert In Style: A Zero-Shot Generative Framework for Harmonious Cross-Domain Object Composition

Raghu Chittersu

Yuvraj Singh Rathore

268

0

0

19 Nov 2025

SplitFlux: Learning to Decouple Content and Style from a Single Image

SplitFlux: Learning to Decouple Content and Style from a Single Image

231

1

0

19 Nov 2025

Towards Deploying VLA without Fine-Tuning: Plug-and-Play Inference-Time VLA Policy Steering via Embodied Evolutionary Diffusion

Towards Deploying VLA without Fine-Tuning: Plug-and-Play Inference-Time VLA Policy Steering via Embodied Evolutionary Diffusion

109

0

0

18 Nov 2025

Text2Traffic: A Text-to-Image Generation and Editing Method for Traffic Scenes

Text2Traffic: A Text-to-Image Generation and Editing Method for Traffic Scenes

310

0

0

17 Nov 2025

ActVAR: Activating Mixtures of Weights and Tokens for Efficient Visual Autoregressive Generation

ActVAR: Activating Mixtures of Weights and Tokens for Efficient Visual Autoregressive Generation

138

0

0

17 Nov 2025

Recurrent Autoregressive Diffusion: Global Memory Meets Local Attention

Recurrent Autoregressive Diffusion: Global Memory Meets Local Attention

Christina Zhang

173

2

0

17 Nov 2025

GenTract: Generative Global Tractography

GenTract: Generative Global Tractography

Elinor Thompson

Daniel C. Alexander

238

0

0

17 Nov 2025

Distribution Matching Distillation Meets Reinforcement Learning

Distribution Matching Distillation Meets Reinforcement Learning

Liuzhuozheng Li

...

426

2

0

17 Nov 2025

MeanFlow Transformers with Representation Autoencoders

MeanFlow Transformers with Representation Autoencoders

226

1

0

17 Nov 2025

Towards High-Consistency Embodied World Model with Multi-View Trajectory Videos

Towards High-Consistency Embodied World Model with Multi-View Trajectory Videos

260

0

0

17 Nov 2025

Training-Free Multi-View Extension of IC-Light for Textual Position-Aware Scene Relighting

Training-Free Multi-View Extension of IC-Light for Textual Position-Aware Scene Relighting

185

0

0

17 Nov 2025

Generative Photographic Control for Scene-Consistent Video Cinematic Editing

Generative Photographic Control for Scene-Consistent Video Cinematic Editing

...

Chen Change Loy

177

0

0

17 Nov 2025

DINO-Detect: A Simple yet Effective Framework for Blur-Robust AI-Generated Image Detection

DINO-Detect: A Simple yet Effective Framework for Blur-Robust AI-Generated Image Detection

...

243

0

0

16 Nov 2025

TempoMaster: Efficient Long Video Generation via Next-Frame-Rate Prediction

TempoMaster: Efficient Long Video Generation via Next-Frame-Rate Prediction

119

1

0

16 Nov 2025

GeoMVD: Geometry-Enhanced Multi-View Generation Model Based on Geometric Information Extraction

GeoMVD: Geometry-Enhanced Multi-View Generation Model Based on Geometric Information Extraction

315

0

0

15 Nov 2025

TIMERIPPLE: Accelerating vDiTs by Understanding the Spatio-Temporal Correlations in Latent Space

TIMERIPPLE: Accelerating vDiTs by Understanding the Spatio-Temporal Correlations in Latent Space

199

0

0

15 Nov 2025

ProAV-DiT: A Projected Latent Diffusion Transformer for Efficient Synchronized Audio-Video Generation

ProAV-DiT: A Projected Latent Diffusion Transformer for Efficient Synchronized Audio-Video Generation

195

0

0

15 Nov 2025

Adaptive Begin-of-Video Tokens for Autoregressive Video Diffusion Models

Adaptive Begin-of-Video Tokens for Autoregressive Video Diffusion Models

260

0

0

15 Nov 2025

Mixture of States: Routing Token-Level Dynamics for Multimodal Generation

Mixture of States: Routing Token-Level Dynamics for Multimodal Generation

...

Juan-Manuel Perez-Rua

Jürgen Schmidhuber

105

0

0

15 Nov 2025

A Best-of-Both-Worlds Proof for Tsallis-INF without Fenchel Conjugates

A Best-of-Both-Worlds Proof for Tsallis-INF without Fenchel Conjugates

Francesco Orabona

126

19

0

14 Nov 2025

Depth Anything 3: Recovering the Visual Space from Any Views

Depth Anything 3: Recovering the Visual Space from Any Views

713

17

0

13 Nov 2025

nuPlan-R: A Closed-Loop Planning Benchmark for Autonomous Driving via Reactive Multi-Agent Simulation

nuPlan-R: A Closed-Loop Planning Benchmark for Autonomous Driving via Reactive Multi-Agent Simulation

308

0

0

13 Nov 2025

Generative AI Meets 6G and Beyond: Diffusion Models for Semantic Communications

Generative AI Meets 6G and Beyond: Diffusion Models for Semantic Communications

Khaled B. Letaief

429

0

0

11 Nov 2025

oboro: Text-to-Image Synthesis on Limited Data using Flow-based Diffusion Transformer with MMH Attention

oboro: Text-to-Image Synthesis on Limited Data using Flow-based Diffusion Transformer with MMH Attention

Ryusuke Mizutani

Tsugumi Kadowaki

169

0

0

11 Nov 2025

Beyond Randomness: Understand the Order of the Noise in Diffusion

Beyond Randomness: Understand the Order of the Noise in Diffusion

322

0

0

11 Nov 2025

Simulating the Visual World with Artificial Intelligence: A Roadmap

Simulating the Visual World with Artificial Intelligence: A Roadmap

489

1

0

11 Nov 2025

E2E-VGuard: Adversarial Prevention for Production LLM-based End-To-End Speech Synthesis

E2E-VGuard: Adversarial Prevention for Production LLM-based End-To-End Speech Synthesis

189

0

0

10 Nov 2025

GNN-Enabled Robust Hybrid Beamforming with Score-Based CSI Generation and Denoising

GNN-Enabled Robust Hybrid Beamforming with Score-Based CSI Generation and Denoising

60

1

0

10 Nov 2025

RelightMaster: Precise Video Relighting with Multi-plane Light Images

RelightMaster: Precise Video Relighting with Multi-plane Light Images

230

2

0

09 Nov 2025

Latent Refinement via Flow Matching for Training-free Linear Inverse Problem Solving

Latent Refinement via Flow Matching for Training-free Linear Inverse Problem Solving

187

0

0

08 Nov 2025

Neodragon: Mobile Video Generation using Diffusion Transformer

Neodragon: Mobile Video Generation using Diffusion Transformer

Animesh Karnewar

Denis Korzhenkov

Ioannis Lelekas

...

Mohsen Ghafoorian

160

2

0

08 Nov 2025

Enhancing Diffusion Model Guidance through Calibration and Regularization

Enhancing Diffusion Model Guidance through Calibration and Regularization

Seyed Alireza Javid

Amirhossein Bagheri

Nuria González-Prelcic

194

0

0

08 Nov 2025

FreeControl: Efficient, Training-Free Structural Control via One-Step Attention Extraction

FreeControl: Efficient, Training-Free Structural Control via One-Step Attention Extraction

134

0

0

07 Nov 2025

Rethinking Metrics and Diffusion Architecture for 3D Point Cloud Generation

Rethinking Metrics and Diffusion Architecture for 3D Point Cloud Generation

David Ryckelynck

Yannick Tillier

Etienne Decencière

328

0

0

07 Nov 2025

VLM-driven Skill Selection for Robotic Assembly Tasks

VLM-driven Skill Selection for Robotic Assembly Tasks

92

0

0

07 Nov 2025

TwinVLA: Data-Efficient Bimanual Manipulation with Twin Single-Arm Vision-Language-Action Models

TwinVLA: Data-Efficient Bimanual Manipulation with Twin Single-Arm Vision-Language-Action Models

84

0

0

07 Nov 2025

On Flow Matching KL Divergence

On Flow Matching KL Divergence

Jerry Yao-Chieh Hu

342

0

0

07 Nov 2025

MusRec: Zero-Shot Text-to-Music Editing via Rectified Flow and Diffusion Transformers

MusRec: Zero-Shot Text-to-Music Editing via Rectified Flow and Diffusion Transformers

348

0

0

06 Nov 2025

InfinityStar: Unified Spacetime AutoRegressive Modeling for Visual Generation

InfinityStar: Unified Spacetime AutoRegressive Modeling for Visual Generation

271

5

0

06 Nov 2025

Evo-1: Lightweight Vision-Language-Action Model with Preserved Semantic Alignment

Evo-1: Lightweight Vision-Language-Action Model with Preserved Semantic Alignment

...

169

4

0

06 Nov 2025

Diffusion-SDPO: Safeguarded Direct Preference Optimization for Diffusion Models

Diffusion-SDPO: Safeguarded Direct Preference Optimization for Diffusion Models

269

2

0

05 Nov 2025

UniAVGen: Unified Audio and Video Generation with Asymmetric Cross-Modal Interactions

UniAVGen: Unified Audio and Video Generation with Asymmetric Cross-Modal Interactions

242

5

0

05 Nov 2025

Scalable Single-Cell Gene Expression Generation with Latent Diffusion Models

Scalable Single-Cell Gene Expression Generation with Latent Diffusion Models

Payam Dibaeinia

James D. Pearce

Theofanis Karaletsos

Jakub M. Tomczak

179

1

0

04 Nov 2025

Towards One-step Causal Video Generation via Adversarial Self-Distillation

Towards One-step Causal Video Generation via Adversarial Self-Distillation

Jiangning Zhang

206

3

0

03 Nov 2025

Diffusion Transformer meets Multi-level Wavelet Spectrum for Single Image Super-Resolution

Diffusion Transformer meets Multi-level Wavelet Spectrum for Single Image Super-Resolution

Paul Barom Jeon

412

0

0

03 Nov 2025

Lightweight Learning from Actuation-Space Demonstrations via Flow Matching for Whole-Body Soft Robotic Grasping

Lightweight Learning from Actuation-Space Demonstrations via Flow Matching for Whole-Body Soft Robotic Grasping

Ibrahim Alsarraj

160

0

0

03 Nov 2025

Fractional Diffusion Bridge Models

Fractional Diffusion Bridge Models

Maximilian Springenberg

Christoph Knochenhauer

163

0

0

03 Nov 2025

1 2 3 4 5...53 54 55

Page 4 of 55

Pageof 55