ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2311.15127
  4. Cited By
Stable Video Diffusion: Scaling Latent Video Diffusion Models to Large
  Datasets

Stable Video Diffusion: Scaling Latent Video Diffusion Models to Large Datasets

25 November 2023
A. Blattmann
Tim Dockhorn
Sumith Kulal
Daniel Mendelevitch
Maciej Kilian
Dominik Lorenz
Yam Levi
Zion English
Vikram S. Voleti
Adam Letts
Varun Jampani
Robin Rombach
    VGen
ArXiv (abs)PDFHTMLHuggingFace (13 upvotes)Github (25943★)

Papers citing "Stable Video Diffusion: Scaling Latent Video Diffusion Models to Large Datasets"

50 / 1,008 papers shown
Online Video Depth Anything: Temporally-Consistent Depth Prediction with Low Memory Consumption
Online Video Depth Anything: Temporally-Consistent Depth Prediction with Low Memory Consumption
Johann-Friedrich Feiden
Tim Küchler
Denis Zavadski
Bogdan Savchynskyy
Carsten Rother
VLM
121
0
0
10 Oct 2025
UniVideo: Unified Understanding, Generation, and Editing for Videos
UniVideo: Unified Understanding, Generation, and Editing for Videos
Cong Wei
Quande Liu
Zixuan Ye
Qiulin Wang
Xintao Wang
Pengfei Wan
Kun Gai
Wenhu Chen
VGen
262
14
0
09 Oct 2025
NovaFlow: Zero-Shot Manipulation via Actionable Flow from Generated Videos
NovaFlow: Zero-Shot Manipulation via Actionable Flow from Generated Videos
Hongyu Li
Lingfeng Sun
Yafei Hu
Duy Ta
Jennifer Barry
George Konidaris
Jiahui Fu
133
5
0
09 Oct 2025
VideoCanvas: Unified Video Completion from Arbitrary Spatiotemporal Patches via In-Context Conditioning
VideoCanvas: Unified Video Completion from Arbitrary Spatiotemporal Patches via In-Context Conditioning
Minghong Cai
Qiulin Wang
Zongli Ye
Wenze Liu
Quande Liu
Weicai Ye
X. Wang
Pengfei Wan
Kun Gai
Xiangyu Yue
VGen
93
0
0
09 Oct 2025
An approach for systematic decomposition of complex llm tasks
An approach for systematic decomposition of complex llm tasks
Tianle Zhou
Jiakai Xu
G. Liu
Jiaxiang Liu
Haonan Wang
Eugene Wu
148
0
0
09 Oct 2025
UniMMVSR: A Unified Multi-Modal Framework for Cascaded Video Super-Resolution
UniMMVSR: A Unified Multi-Modal Framework for Cascaded Video Super-Resolution
Shian Du
Menghan Xia
Chang-rui Liu
Quande Liu
Xintao Wang
Pengfei Wan
Xiangyang Ji
VGenSupR
275
0
0
09 Oct 2025
PAC Learnability in the Presence of Performativity
PAC Learnability in the Presence of Performativity
Ivan Kirev
Lyuben Baltadzhiev
Nikola Konstantinov
134
2
0
09 Oct 2025
FreqCa: Accelerating Diffusion Models via Frequency-Aware Caching
FreqCa: Accelerating Diffusion Models via Frequency-Aware Caching
Jiacheng Liu
Peiliang Cai
Qinming Zhou
Yuqi Lin
Deyang Kong
...
Haowen Xu
Chang Zou
J. Tang
S. Zheng
Linfeng Zhang
103
1
0
09 Oct 2025
MultiCOIN: Multi-Modal COntrollable Video INbetweening
MultiCOIN: Multi-Modal COntrollable Video INbetweening
Maham Tanveer
Yang Zhou
Simon Niklaus
Ali Mahdavi-Amiri
Hao Zhang
Krishna Kumar Singh
Nanxuan Zhao
DiffMVGen
181
1
0
09 Oct 2025
OBCache: Optimal Brain KV Cache Pruning for Efficient Long-Context LLM Inference
OBCache: Optimal Brain KV Cache Pruning for Efficient Long-Context LLM Inference
Yuzhe Gu
Xiyu Liang
Jiaojiao Zhao
Enmao Diao
136
2
0
09 Oct 2025
One Stone with Two Birds: A Null-Text-Null Frequency-Aware Diffusion Models for Text-Guided Image Inpainting
One Stone with Two Birds: A Null-Text-Null Frequency-Aware Diffusion Models for Text-Guided Image Inpainting
Haipeng Liu
Yang Wang
M. Y. Wang
DiffM
511
4
0
09 Oct 2025
A Honest Cross-Validation Estimator for Prediction Performance
A Honest Cross-Validation Estimator for Prediction Performance
Tianyu Pan
Vincent Z. Yu
Viswanath Devanarayan
Lu Tian
142
0
0
09 Oct 2025
FlexTraj: Image-to-Video Generation with Flexible Point Trajectory Control
FlexTraj: Image-to-Video Generation with Flexible Point Trajectory Control
Zhiyuan Zhang
Can Wang
Dongdong Chen
Jing Liao
VGen
245
2
0
09 Oct 2025
Real-Time Motion-Controllable Autoregressive Video Diffusion
Real-Time Motion-Controllable Autoregressive Video Diffusion
Kesen Zhao
Jiaxin Shi
B. Zhu
Junbao Zhou
Xiaolong Shen
Yuan Zhou
Qianru Sun
Hanwang Zhang
VGen
227
1
0
09 Oct 2025
Vision-Language-Action Models for Robotics: A Review Towards Real-World Applications
Vision-Language-Action Models for Robotics: A Review Towards Real-World ApplicationsIEEE Access (IEEE Access), 2025
Kento Kawaharazuka
Jihoon Oh
Jun Yamada
Ingmar Posner
Yuke Zhu
LM&Ro
264
27
0
08 Oct 2025
SHANKS: Simultaneous Hearing and Thinking for Spoken Language Models
SHANKS: Simultaneous Hearing and Thinking for Spoken Language Models
Cheng-Han Chiang
Xiaofei Wang
Linjie Li
Chung-Ching Lin
Kevin Qinghong Lin
Shujie Liu
Zhendong Wang
Zhengyuan Yang
Hung-yi Lee
Lijuan Wang
LLMAGReLMRALMLRM
188
3
0
08 Oct 2025
DynamicEval: Rethinking Evaluation for Dynamic Text-to-Video Synthesis
DynamicEval: Rethinking Evaluation for Dynamic Text-to-Video Synthesis
Nithin C. Babu
Aniruddha Mahapatra
Harsh Rangwani
Rajiv Soundararajan
Kuldeep Kulkarni
EGVMVGen
200
0
0
08 Oct 2025
MATRIX: Mask Track Alignment for Interaction-aware Video Generation
MATRIX: Mask Track Alignment for Interaction-aware Video Generation
Siyoon Jin
S. Kim
Dahyun Chung
J. Lee
Hyunwook Choi
Jisu Nam
J. Kim
S. Kim
VGen
106
2
0
08 Oct 2025
Split Conformal Classification with Unsupervised Calibration
Split Conformal Classification with Unsupervised Calibration
Santiago Mazuelas
225
1
0
08 Oct 2025
Medical Vision Language Models as Policies for Robotic Surgery
Medical Vision Language Models as Policies for Robotic SurgeryConference on Algebraic Informatics (AI), 2025
Akshay Muppidi
Martin Radfar
176
4
0
07 Oct 2025
Mitigating Surgical Data Imbalance with Dual-Prediction Video Diffusion Model
Mitigating Surgical Data Imbalance with Dual-Prediction Video Diffusion Model
Danush Kumar Venkatesh
Adam Schmidt
Muhammad Abdullah Jamal
Omid Mohareri
VGenMedIm
144
0
0
07 Oct 2025
Drive&Gen: Co-Evaluating End-to-End Driving and Video Generation Models
Drive&Gen: Co-Evaluating End-to-End Driving and Video Generation Models
Jiahao Wang
Zhenpei Yang
Yijing Bai
Yingwei Li
Yuliang Zou
...
Zehao Zhu
Jyh-Jing Hwang
Dragomir Anguelov
Mingxing Tan
C. Jiang
VGen
101
0
0
07 Oct 2025
VChain: Chain-of-Visual-Thought for Reasoning in Video Generation
VChain: Chain-of-Visual-Thought for Reasoning in Video Generation
Longxiang Zhang
Ning Yu
Gordon Chen
Haonan Qiu
P. Debevec
Ziwei Liu
VGenLRM
87
7
0
06 Oct 2025
Paper2Video: Automatic Video Generation from Scientific Papers
Paper2Video: Automatic Video Generation from Scientific Papers
Zeyu Zhu
Kevin Qinghong Lin
Mike Zheng Shou
VGen
234
4
0
06 Oct 2025
LightCache: Memory-Efficient, Training-Free Acceleration for Video Generation
LightCache: Memory-Efficient, Training-Free Acceleration for Video Generation
Yang Xiao
Gen Li
Kaiyuan Deng
Yushu Wu
Zheng Zhan
Yanzhi Wang
Xiaolong Ma
Bo Hui
VGen
147
1
0
06 Oct 2025
MorphoSim: An Interactive, Controllable, and Editable Language-guided 4D World Simulator
MorphoSim: An Interactive, Controllable, and Editable Language-guided 4D World Simulator
Xuehai He
Shijie Zhou
Thivyanth Venkateswaran
Kaizhi Zheng
Ziyu Wan
A. Kadambi
Xin Eric Wang
VGenSyDaAI4CE
163
0
0
05 Oct 2025
Joint Learning of Pose Regression and Denoising Diffusion with Score Scaling Sampling for Category-level 6D Pose Estimation
Joint Learning of Pose Regression and Denoising Diffusion with Score Scaling Sampling for Category-level 6D Pose Estimation
Seunghyun Lee
Tae-Kyun Kim
DiffM
227
0
0
05 Oct 2025
Scaling Sequence-to-Sequence Generative Neural Rendering
Scaling Sequence-to-Sequence Generative Neural Rendering
Shikun Liu
Kam Woh Ng
Wonbong Jang
Jiadong Guo
Junlin Han
...
Juan C. Pérez
Zijian Zhou
Chi Phung
Tao Xiang
Juan-Manuel Perez-Rua
VGen
129
1
0
05 Oct 2025
World-To-Image: Grounding Text-to-Image Generation with Agent-Driven World Knowledge
World-To-Image: Grounding Text-to-Image Generation with Agent-Driven World Knowledge
Moo Hyun Son
Jintaek Oh
Sun Bin Mun
Jaechul Roh
Sehyun Choi
126
0
0
05 Oct 2025
ChronoEdit: Towards Temporal Reasoning for Image Editing and World Simulation
ChronoEdit: Towards Temporal Reasoning for Image Editing and World Simulation
J. Wu
Xuanchi Ren
Tianchang Shen
Tianshi Cao
Kai He
...
Jose M. Alvarez
Jun Gao
Sanja Fidler
Zian Wang
Huan Ling
DiffMVGen
228
3
0
05 Oct 2025
When and Where do Events Switch in Multi-Event Video Generation?
When and Where do Events Switch in Multi-Event Video Generation?
Ruotong Liao
Guowen Huang
Qing Cheng
Thomas Seidl
Daniel Cremers
Volker Tresp
DiffMVGen
213
0
0
03 Oct 2025
Taming Text-to-Sounding Video Generation via Advanced Modality Condition and Interaction
Taming Text-to-Sounding Video Generation via Advanced Modality Condition and Interaction
Kaisi Guan
Xihua Wang
Zhengfeng Lai
Xin Cheng
Peng Zhang
Xiaojiang Liu
Ruihua Song
Meng Cao
DiffM
264
4
0
03 Oct 2025
Fine-Grained GRPO for Precise Preference Alignment in Flow Models
Fine-Grained GRPO for Precise Preference Alignment in Flow Models
Yujie Zhou
Pengyang Ling
Jiazi Bu
Yibin Wang
Yuhang Zang
Jiaqi Wang
Li Niu
Guangtao Zhai
DiffM
221
3
0
02 Oct 2025
Learning to Generate Rigid Body Interactions with Video Diffusion Models
Learning to Generate Rigid Body Interactions with Video Diffusion Models
David Romero
Ariana Bermúdez
Hao Li
Fabio Pizzati
Ivan Laptev
DiffMVGen
456
0
0
02 Oct 2025
UniVerse: Unleashing the Scene Prior of Video Diffusion Models for Robust Radiance Field Reconstruction
UniVerse: Unleashing the Scene Prior of Video Diffusion Models for Robust Radiance Field Reconstruction
Jin Cao
Hongrui Wu
Ziyong Feng
Hujun Bao
Xiaowei Zhou
Sida Peng
VGen
156
0
0
02 Oct 2025
FreeViS: Training-free Video Stylization with Inconsistent References
FreeViS: Training-free Video Stylization with Inconsistent References
Jiacong Xu
Yiqun Mei
Ke Zhang
Vishal M. Patel
DiffMVGen
206
2
0
02 Oct 2025
Self-Forcing++: Towards Minute-Scale High-Quality Video Generation
Self-Forcing++: Towards Minute-Scale High-Quality Video Generation
Justin Cui
Jie Wu
Ming Li
Tao Yang
Xiaojie Li
Rui Wang
Andrew Bai
Yuanhao Ban
Cho-Jui Hsieh
DiffMVGen
229
32
0
02 Oct 2025
VLA-RFT: Vision-Language-Action Reinforcement Fine-tuning with Verified Rewards in World Simulators
VLA-RFT: Vision-Language-Action Reinforcement Fine-tuning with Verified Rewards in World Simulators
Hengtao Li
Pengxiang Ding
Runze Suo
Yihao Wang
Zirui Ge
...
Kexian Yu
Mingyang Sun
Hongyin Zhang
Donglin Wang
Weihua Su
146
8
0
01 Oct 2025
Arbitrary Generative Video Interpolation
Arbitrary Generative Video Interpolation
Guozhen Zhang
Haiguang Wang
C. Wang
Yuan Zhou
Qinglin Lu
Limin Wang
VGen
148
0
0
01 Oct 2025
LVTINO: LAtent Video consisTency INverse sOlver for High Definition Video Restoration
LVTINO: LAtent Video consisTency INverse sOlver for High Definition Video Restoration
Alessio Spagnoletti
Andrés Almansa
Marcelo Pereyra
DiffMVGen
171
0
0
01 Oct 2025
InfVSR: Breaking Length Limits of Generic Video Super-Resolution
InfVSR: Breaking Length Limits of Generic Video Super-Resolution
Ziqing Zhang
Kai Liu
Zheng Chen
X. Li
Yihao Chen
Bingnan Duan
Linghe Kong
Yulun Zhang
163
2
0
01 Oct 2025
BindWeave: Subject-Consistent Video Generation via Cross-Modal Integration
BindWeave: Subject-Consistent Video Generation via Cross-Modal Integration
Zhaoyang Li
Dongjun Qian
Kai Su
Qishuai Diao
Xiangyang Xia
Chang Liu
Wenfei Yang
Tianzhu Zhang
Zehuan Yuan
DiffMVGen
136
2
0
01 Oct 2025
Can World Models Benefit VLMs for World Dynamics?
Can World Models Benefit VLMs for World Dynamics?
Kevin Zhang
Kuangzhi Ge
Xiaowei Chi
Renrui Zhang
Shaojun Shi
Zhen Dong
Sirui Han
Shanghang Zhang
VGenVLM
134
5
0
01 Oct 2025
EvoWorld: Evolving Panoramic World Generation with Explicit 3D Memory
EvoWorld: Evolving Panoramic World Generation with Explicit 3D Memory
Jiahao Wang
Luoxin Ye
Taiming Lu
Junfei Xiao
Jiahan Zhang
...
Xijun Liu
Rama Chellappa
Cheng-Fang Peng
Alan Yuille
Jieneng Chen
VGen
134
3
0
01 Oct 2025
PatchVSR: Breaking Video Diffusion Resolution Limits with Patch-wise Video Super-Resolution
PatchVSR: Breaking Video Diffusion Resolution Limits with Patch-wise Video Super-ResolutionComputer Vision and Pattern Recognition (CVPR), 2025
S. Du
Menghan Xia
Chang Liu
Xintao Wang
Jing Wang
Pengfei Wan
Di Zhang
Xiangyang Ji
DiffMSupRVGen
296
3
0
30 Sep 2025
Stable Cinemetrics : Structured Taxonomy and Evaluation for Professional Video Generation
Stable Cinemetrics : Structured Taxonomy and Evaluation for Professional Video Generation
Agneet Chatterjee
Rahim Entezari
Maksym Zhuravinskyi
Maksim Lapin
Reshinth Adithyan
Amit Raj
Chitta Baral
Yezhou Yang
Varun Jampani
DiffMEGVMVGen
139
0
0
30 Sep 2025
MotionRAG: Motion Retrieval-Augmented Image-to-Video Generation
MotionRAG: Motion Retrieval-Augmented Image-to-Video Generation
Chenhui Zhu
Yilu Wu
Shuai Wang
Gangshan Wu
Limin Wang
DiffMVGen
125
1
0
30 Sep 2025
DC-VideoGen: Efficient Video Generation with Deep Compression Video Autoencoder
DC-VideoGen: Efficient Video Generation with Deep Compression Video Autoencoder
Junyu Chen
Wenkun He
Yuchao Gu
Yuyang Zhao
Jincheng Yu
...
Haocheng Xi
Ligeng Zhu
Enze Xie
Song Han
Han Cai
VGen
174
2
0
29 Sep 2025
UniVid: The Open-Source Unified Video Model
UniVid: The Open-Source Unified Video Model
Jiabin Luo
Junhui Lin
Zeyu Zhang
Biao Wu
Meng Fang
Ling-Hao Chen
Hao Tang
VGen
283
8
0
29 Sep 2025
UniLat3D: Geometry-Appearance Unified Latents for Single-Stage 3D Generation
UniLat3D: Geometry-Appearance Unified Latents for Single-Stage 3D Generation
Guanjun Wu
Jiemin Fang
Chen Yang
Sikuang Li
Taoran Yi
...
Xiaopeng Zhang
Wei Wei
Wenyu Liu
Xinggang Wang
Qi Tian
166
3
0
29 Sep 2025
Previous
12345...192021
Next