Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales

Terms and Conditions

Twitter GitHub LinkedIn Bluesky Youtube

© 2026 ResearchTrend.AI, All rights reserved.

Home
Papers
2212.09748
Cited By

Scalable Diffusion Models with Transformers

v1v2 (latest)

Scalable Diffusion Models with Transformers

IEEE International Conference on Computer Vision (ICCV), 2022

19 December 2022

William S. Peebles

ArXiv (abs)PDF HTML HuggingFace (18 upvotes)

Papers citing "Scalable Diffusion Models with Transformers"

50 / 2,711 papers shown

LAMIC: Layout-Aware Multi-Image Composition via Scalability of Multimodal Diffusion Transformer

LAMIC: Layout-Aware Multi-Image Composition via Scalability of Multimodal Diffusion Transformer

167

2

0

24 Dec 2025

Denoise to Track: Harnessing Video Diffusion Priors for Robust Correspondence

Denoise to Track: Harnessing Video Diffusion Priors for Robust Correspondence

238

0

0

04 Dec 2025

YingMusic-SVC: Real-World Robust Zero-Shot Singing Voice Conversion with Flow-GRPO and Singing-Specific Inductive Biases

YingMusic-SVC: Real-World Robust Zero-Shot Singing Voice Conversion with Flow-GRPO and Singing-Specific Inductive Biases

Wei-Qiang Zhang

49

0

0

04 Dec 2025

Efficient Generative Transformer Operators For Million-Point PDEs

Efficient Generative Transformer Operators For Million-Point PDEs

Armand K. Koupai

Patrick Gallinari

68

0

0

04 Dec 2025

Refaçade: Editing Object with Given Reference Texture

Refaçade: Editing Object with Given Reference Texture

180

0

0

04 Dec 2025

ReflexFlow: Rethinking Learning Objective for Exposure Bias Alleviation in Flow Matching

ReflexFlow: Rethinking Learning Objective for Exposure Bias Alleviation in Flow Matching

...

145

0

0

04 Dec 2025

UniTS: Unified Time Series Generative Model for Remote Sensing

UniTS: Unified Time Series Generative Model for Remote Sensing

...

266

0

0

04 Dec 2025

Joint 3D Geometry Reconstruction and Motion Generation for 4D Synthesis from a Single Image

Joint 3D Geometry Reconstruction and Motion Generation for 4D Synthesis from a Single Image

235

0

0

04 Dec 2025

VideoSSM: Autoregressive Long Video Generation with Hybrid State-Space Memory

VideoSSM: Autoregressive Long Video Generation with Hybrid State-Space Memory

...

182

1

0

04 Dec 2025

A Sanity Check for Multi-In-Domain Face Forgery Detection in the Real World

A Sanity Check for Multi-In-Domain Face Forgery Detection in the Real World

123

0

0

04 Dec 2025

AdaPower: Specializing World Foundation Models for Predictive Manipulation

AdaPower: Specializing World Foundation Models for Predictive Manipulation

80

0

0

03 Dec 2025

ConvRot: Rotation-Based Plug-and-Play 4-bit Quantization for Diffusion Transformers

ConvRot: Rotation-Based Plug-and-Play 4-bit Quantization for Diffusion Transformers

160

0

0

03 Dec 2025

Beyond Boundary Frames: Audio-Visual Semantic Guidance for Context-Aware Video Interpolation

Beyond Boundary Frames: Audio-Visual Semantic Guidance for Context-Aware Video Interpolation

223

0

0

03 Dec 2025

CoDA: From Text-to-Image Diffusion Models to Training-Free Dataset Distillation

CoDA: From Text-to-Image Diffusion Models to Training-Free Dataset Distillation

153

0

0

03 Dec 2025

GeoVideo: Introducing Geometric Regularization into Video Generation Model

GeoVideo: Introducing Geometric Regularization into Video Generation Model

459

2

0

03 Dec 2025

CSMapping: Scalable Crowdsourced Semantic Mapping and Topology Inference for Autonomous Driving

CSMapping: Scalable Crowdsourced Semantic Mapping and Topology Inference for Autonomous Driving

Chih-Chung Chou

102

0

0

03 Dec 2025

C3G: Learning Compact 3D Representations with 2K Gaussians

C3G: Learning Compact 3D Representations with 2K Gaussians

...

Takuya Narihira

215

0

0

03 Dec 2025

FloodDiffusion: Tailored Diffusion Forcing for Streaming Motion Generation

FloodDiffusion: Tailored Diffusion Forcing for Streaming Motion Generation

114

0

0

03 Dec 2025

SimFlow: Simplified and End-to-End Training of Latent Normalizing Flows

SimFlow: Simplified and End-to-End Training of Latent Normalizing Flows

Guangting Zheng

187

0

0

03 Dec 2025

YingVideo-MV: Music-Driven Multi-Stage Video Generation

YingVideo-MV: Music-Driven Multi-Stage Video Generation

244

0

0

02 Dec 2025

MAViD: A Multimodal Framework for Audio-Visual Dialogue Understanding and Generation

MAViD: A Multimodal Framework for Audio-Visual Dialogue Understanding and Generation

127

0

0

02 Dec 2025

Hear What Matters! Text-conditioned Selective Video-to-Audio Generation

Hear What Matters! Text-conditioned Selective Video-to-Audio Generation

112

0

0

02 Dec 2025

MultiShotMaster: A Controllable Multi-Shot Video Generation Framework

MultiShotMaster: A Controllable Multi-Shot Video Generation Framework

215

2

0

02 Dec 2025

Video2Act: A Dual-System Video Diffusion Policy with Robotic Spatio-Motional Modeling

Video2Act: A Dual-System Video Diffusion Policy with Robotic Spatio-Motional Modeling

Shanghang Zhang

306

1

0

02 Dec 2025

Taming Camera-Controlled Video Generation with Verifiable Geometry Reward

Taming Camera-Controlled Video Generation with Verifiable Geometry Reward

156

0

0

02 Dec 2025

Generative Editing in the Joint Vision-Language Space for Zero-Shot Composed Image Retrieval

154

0

0

01 Dec 2025

Open-world Hand-Object Interaction Video Generation Based on Structure and Contact-aware Representation

...

72

0

0

01 Dec 2025

DreamingComics: A Story Visualization Pipeline via Subject and Layout Customized Generation using Video Models

DiffM AI4TS VGen

153

0

0

01 Dec 2025

Reversible Inversion for Training-Free Exemplar-guided Image Editing

131

0

0

01 Dec 2025

ViT$^3$: Unlocking Test-Time Training in Vision

^3

: Unlocking Test-Time Training in Vision

76

0

0

01 Dec 2025

Modality-Augmented Fine-Tuning of Foundation Robot Policies for Cross-Embodiment Manipulation on GR1 and G1

113

0

0

01 Dec 2025

Efficient Training of Diffusion Mixture-of-Experts Models: A Practical Recipe

Efficient Training of Diffusion Mixture-of-Experts Models: A Practical Recipe

114

0

0

01 Dec 2025

TokenPure: Watermark Removal through Tokenized Appearance and Structural Guidance

201

0

0

01 Dec 2025

TUNA: Taming Unified Visual Representations for Native Unified Multimodal Models

TUNA: Taming Unified Visual Representations for Native Unified Multimodal Models

...

162

1

0

01 Dec 2025

FRAMER: Frequency-Aligned Self-Distillation with Adaptive Modulation Leveraging Diffusion Priors for Real-World Image Super-Resolution

162

0

0

01 Dec 2025

SpriteHand: Real-Time Versatile Hand-Object Interaction with Autoregressive Video Generation

189

0

0

01 Dec 2025

Improved Mean Flows: On the Challenges of Fastforward Generative Models

138

3

0

01 Dec 2025

ResDiT: Evoking the Intrinsic Resolution Scalability in Diffusion Transformers

ResDiT: Evoking the Intrinsic Resolution Scalability in Diffusion Transformers

97

0

0

01 Dec 2025

Dynamic-eDiTor: Training-Free Text-Driven 4D Scene Editing with Multimodal Diffusion Transformer

DiffM 3DGS VGen

145

0

0

30 Nov 2025

CycleManip: Enabling Cyclic Task Manipulation via Effective Historical Perception and Understanding

CycleManip: Enabling Cyclic Task Manipulation via Effective Historical Perception and Understanding

57

0

0

30 Nov 2025

Silhouette-based Gait Foundation Model

Vishal M. Patel

65

0

0

30 Nov 2025

TrajDiff: End-to-end Autonomous Driving without Perception Annotation

80

1

0

30 Nov 2025

Audio-Visual World Models: Towards Multisensory Imagination in Sight and Sound

Audio-Visual World Models: Towards Multisensory Imagination in Sight and Sound

162

0

0

30 Nov 2025

UniDiff: Parameter-Efficient Adaptation of Diffusion Models for Land Cover Classification with Multi-Modal Remotely Sensed Imagery and Sparse Annotations

UniDiff: Parameter-Efficient Adaptation of Diffusion Models for Land Cover Classification with Multi-Modal Remotely Sensed Imagery and Sparse Annotations

67

0

0

29 Nov 2025

Optimizing Distributional Geometry Alignment with Optimal Transport for Generative Dataset Distillation

Optimizing Distributional Geometry Alignment with Optimal Transport for Generative Dataset Distillation

232

1

0

29 Nov 2025

Image Generation as a Visual Planner for Robotic Manipulation

Image Generation as a Visual Planner for Robotic Manipulation

90

0

0

29 Nov 2025

PhysGen: Physically Grounded 3D Shape Generation for Industrial Design

PhysGen: Physically Grounded 3D Shape Generation for Industrial Design

90

0

0

29 Nov 2025

What about gravity in video generation? Post-Training Newton's Laws with Verifiable Rewards

What about gravity in video generation? Post-Training Newton's Laws with Verifiable Rewards

Vicky Kalogeiton

Dimitris Samaras

91

1

0

29 Nov 2025

CC-FMO: Camera-Conditioned Zero-Shot Single Image to 3D Scene Generation with Foundation Model Orchestration

196

0

0

29 Nov 2025

LAP: Fast LAtent Diffusion Planner with Fine-Grained Feature Distillation for Autonomous Driving

LAP: Fast LAtent Diffusion Planner with Fine-Grained Feature Distillation for Autonomous Driving

188

0

0

29 Nov 2025

1 2 3 4...53 54 55

Page 1 of 55

Pageof 55