Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales

Terms and Conditions

Twitter GitHub LinkedIn Bluesky Youtube

© 2026 ResearchTrend.AI, All rights reserved.

Home
Papers
2311.15127
Cited By

Stable Video Diffusion: Scaling Latent Video Diffusion Models to Large
Datasets

Stable Video Diffusion: Scaling Latent Video Diffusion Models to Large Datasets

25 November 2023

Daniel Mendelevitch

Vikram S. Voleti

ArXiv (abs)PDF HTML HuggingFace (13 upvotes)Github (25943★)

Papers citing "Stable Video Diffusion: Scaling Latent Video Diffusion Models to Large Datasets"

50 / 1,008 papers shown

Online Video Depth Anything: Temporally-Consistent Depth Prediction with Low Memory Consumption

Online Video Depth Anything: Temporally-Consistent Depth Prediction with Low Memory Consumption

Johann-Friedrich Feiden

Bogdan Savchynskyy

121

0

0

10 Oct 2025

UniVideo: Unified Understanding, Generation, and Editing for Videos

UniVideo: Unified Understanding, Generation, and Editing for Videos

262

14

0

09 Oct 2025

NovaFlow: Zero-Shot Manipulation via Actionable Flow from Generated Videos

NovaFlow: Zero-Shot Manipulation via Actionable Flow from Generated Videos

George Konidaris

133

5

0

09 Oct 2025

VideoCanvas: Unified Video Completion from Arbitrary Spatiotemporal Patches via In-Context Conditioning

VideoCanvas: Unified Video Completion from Arbitrary Spatiotemporal Patches via In-Context Conditioning

93

0

0

09 Oct 2025

An approach for systematic decomposition of complex llm tasks

An approach for systematic decomposition of complex llm tasks

148

0

0

09 Oct 2025

UniMMVSR: A Unified Multi-Modal Framework for Cascaded Video Super-Resolution

UniMMVSR: A Unified Multi-Modal Framework for Cascaded Video Super-Resolution

275

0

0

09 Oct 2025

PAC Learnability in the Presence of Performativity

PAC Learnability in the Presence of Performativity

Lyuben Baltadzhiev

Nikola Konstantinov

134

2

0

09 Oct 2025

FreqCa: Accelerating Diffusion Models via Frequency-Aware Caching

FreqCa: Accelerating Diffusion Models via Frequency-Aware Caching

...

103

1

0

09 Oct 2025

MultiCOIN: Multi-Modal COntrollable Video INbetweening

MultiCOIN: Multi-Modal COntrollable Video INbetweening

Ali Mahdavi-Amiri

Krishna Kumar Singh

181

1

0

09 Oct 2025

OBCache: Optimal Brain KV Cache Pruning for Efficient Long-Context LLM Inference

OBCache: Optimal Brain KV Cache Pruning for Efficient Long-Context LLM Inference

136

2

0

09 Oct 2025

One Stone with Two Birds: A Null-Text-Null Frequency-Aware Diffusion Models for Text-Guided Image Inpainting

One Stone with Two Birds: A Null-Text-Null Frequency-Aware Diffusion Models for Text-Guided Image Inpainting

511

4

0

09 Oct 2025

A Honest Cross-Validation Estimator for Prediction Performance

A Honest Cross-Validation Estimator for Prediction Performance

Viswanath Devanarayan

142

0

0

09 Oct 2025

FlexTraj: Image-to-Video Generation with Flexible Point Trajectory Control

FlexTraj: Image-to-Video Generation with Flexible Point Trajectory Control

245

2

0

09 Oct 2025

Real-Time Motion-Controllable Autoregressive Video Diffusion

Real-Time Motion-Controllable Autoregressive Video Diffusion

227

1

0

09 Oct 2025

Vision-Language-Action Models for Robotics: A Review Towards Real-World Applications

Vision-Language-Action Models for Robotics: A Review Towards Real-World ApplicationsIEEE Access (IEEE Access), 2025

Kento Kawaharazuka

264

27

0

08 Oct 2025

SHANKS: Simultaneous Hearing and Thinking for Spoken Language Models

SHANKS: Simultaneous Hearing and Thinking for Spoken Language Models

Cheng-Han Chiang

Chung-Ching Lin

Kevin Qinghong Lin

LLMAG ReLM RALM LRM

188

3

0

08 Oct 2025

DynamicEval: Rethinking Evaluation for Dynamic Text-to-Video Synthesis

DynamicEval: Rethinking Evaluation for Dynamic Text-to-Video Synthesis

Aniruddha Mahapatra

Rajiv Soundararajan

Kuldeep Kulkarni

200

0

0

08 Oct 2025

MATRIX: Mask Track Alignment for Interaction-aware Video Generation

MATRIX: Mask Track Alignment for Interaction-aware Video Generation

106

2

0

08 Oct 2025

Split Conformal Classification with Unsupervised Calibration

Split Conformal Classification with Unsupervised Calibration

Santiago Mazuelas

225

1

0

08 Oct 2025

Medical Vision Language Models as Policies for Robotic Surgery

Medical Vision Language Models as Policies for Robotic SurgeryConference on Algebraic Informatics (AI), 2025

176

4

0

07 Oct 2025

Mitigating Surgical Data Imbalance with Dual-Prediction Video Diffusion Model

Mitigating Surgical Data Imbalance with Dual-Prediction Video Diffusion Model

Danush Kumar Venkatesh

Muhammad Abdullah Jamal

144

0

0

07 Oct 2025

Drive&Gen: Co-Evaluating End-to-End Driving and Video Generation Models

Drive&Gen: Co-Evaluating End-to-End Driving and Video Generation Models

...

Dragomir Anguelov

101

0

0

07 Oct 2025

VChain: Chain-of-Visual-Thought for Reasoning in Video Generation

VChain: Chain-of-Visual-Thought for Reasoning in Video Generation

Longxiang Zhang

87

7

0

06 Oct 2025

Paper2Video: Automatic Video Generation from Scientific Papers

Paper2Video: Automatic Video Generation from Scientific Papers

Kevin Qinghong Lin

Mike Zheng Shou

234

4

0

06 Oct 2025

LightCache: Memory-Efficient, Training-Free Acceleration for Video Generation

LightCache: Memory-Efficient, Training-Free Acceleration for Video Generation

147

1

0

06 Oct 2025

MorphoSim: An Interactive, Controllable, and Editable Language-guided 4D World Simulator

MorphoSim: An Interactive, Controllable, and Editable Language-guided 4D World Simulator

Thivyanth Venkateswaran

VGen SyDa AI4CE

163

0

0

05 Oct 2025

Joint Learning of Pose Regression and Denoising Diffusion with Score Scaling Sampling for Category-level 6D Pose Estimation

Joint Learning of Pose Regression and Denoising Diffusion with Score Scaling Sampling for Category-level 6D Pose Estimation

227

0

0

05 Oct 2025

Scaling Sequence-to-Sequence Generative Neural Rendering

Scaling Sequence-to-Sequence Generative Neural Rendering

...

Juan-Manuel Perez-Rua

129

1

0

05 Oct 2025

World-To-Image: Grounding Text-to-Image Generation with Agent-Driven World Knowledge

World-To-Image: Grounding Text-to-Image Generation with Agent-Driven World Knowledge

126

0

0

05 Oct 2025

ChronoEdit: Towards Temporal Reasoning for Image Editing and World Simulation

ChronoEdit: Towards Temporal Reasoning for Image Editing and World Simulation

...

Jose M. Alvarez

228

3

0

05 Oct 2025

When and Where do Events Switch in Multi-Event Video Generation?

When and Where do Events Switch in Multi-Event Video Generation?

213

0

0

03 Oct 2025

Taming Text-to-Sounding Video Generation via Advanced Modality Condition and Interaction

Taming Text-to-Sounding Video Generation via Advanced Modality Condition and Interaction

264

4

0

03 Oct 2025

Fine-Grained GRPO for Precise Preference Alignment in Flow Models

Fine-Grained GRPO for Precise Preference Alignment in Flow Models

221

3

0

02 Oct 2025

Learning to Generate Rigid Body Interactions with Video Diffusion Models

Learning to Generate Rigid Body Interactions with Video Diffusion Models

Ariana Bermúdez

456

0

0

02 Oct 2025

UniVerse: Unleashing the Scene Prior of Video Diffusion Models for Robust Radiance Field Reconstruction

UniVerse: Unleashing the Scene Prior of Video Diffusion Models for Robust Radiance Field Reconstruction

156

0

0

02 Oct 2025

FreeViS: Training-free Video Stylization with Inconsistent References

FreeViS: Training-free Video Stylization with Inconsistent References

Vishal M. Patel

206

2

0

02 Oct 2025

Self-Forcing++: Towards Minute-Scale High-Quality Video Generation

Self-Forcing++: Towards Minute-Scale High-Quality Video Generation

229

32

0

02 Oct 2025

VLA-RFT: Vision-Language-Action Reinforcement Fine-tuning with Verified Rewards in World Simulators

VLA-RFT: Vision-Language-Action Reinforcement Fine-tuning with Verified Rewards in World Simulators

...

146

8

0

01 Oct 2025

Arbitrary Generative Video Interpolation

Arbitrary Generative Video Interpolation

148

0

0

01 Oct 2025

LVTINO: LAtent Video consisTency INverse sOlver for High Definition Video Restoration

LVTINO: LAtent Video consisTency INverse sOlver for High Definition Video Restoration

Alessio Spagnoletti

Andrés Almansa

Marcelo Pereyra

171

0

0

01 Oct 2025

InfVSR: Breaking Length Limits of Generic Video Super-Resolution

InfVSR: Breaking Length Limits of Generic Video Super-Resolution

163

2

0

01 Oct 2025

BindWeave: Subject-Consistent Video Generation via Cross-Modal Integration

BindWeave: Subject-Consistent Video Generation via Cross-Modal Integration

136

2

0

01 Oct 2025

Can World Models Benefit VLMs for World Dynamics?

Can World Models Benefit VLMs for World Dynamics?

Shanghang Zhang

134

5

0

01 Oct 2025

EvoWorld: Evolving Panoramic World Generation with Explicit 3D Memory

EvoWorld: Evolving Panoramic World Generation with Explicit 3D Memory

...

Cheng-Fang Peng

134

3

0

01 Oct 2025

PatchVSR: Breaking Video Diffusion Resolution Limits with Patch-wise Video Super-Resolution

PatchVSR: Breaking Video Diffusion Resolution Limits with Patch-wise Video Super-ResolutionComputer Vision and Pattern Recognition (CVPR), 2025

DiffM SupR VGen

296

3

0

30 Sep 2025

Stable Cinemetrics : Structured Taxonomy and Evaluation for Professional Video Generation

Stable Cinemetrics : Structured Taxonomy and Evaluation for Professional Video Generation

Agneet Chatterjee

Maksym Zhuravinskyi

Reshinth Adithyan

DiffM EGVM VGen

139

0

0

30 Sep 2025

MotionRAG: Motion Retrieval-Augmented Image-to-Video Generation

MotionRAG: Motion Retrieval-Augmented Image-to-Video Generation

125

1

0

30 Sep 2025

DC-VideoGen: Efficient Video Generation with Deep Compression Video Autoencoder

DC-VideoGen: Efficient Video Generation with Deep Compression Video Autoencoder

...

174

2

0

29 Sep 2025

UniVid: The Open-Source Unified Video Model

UniVid: The Open-Source Unified Video Model

283

8

0

29 Sep 2025

UniLat3D: Geometry-Appearance Unified Latents for Single-Stage 3D Generation

UniLat3D: Geometry-Appearance Unified Latents for Single-Stage 3D Generation

...

166

3

0

29 Sep 2025

1 2 3 4 5...19 20 21