Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
2311.15127
Cited By
Stable Video Diffusion: Scaling Latent Video Diffusion Models to Large Datasets
25 November 2023
A. Blattmann
Tim Dockhorn
Sumith Kulal
Daniel Mendelevitch
Maciej Kilian
Dominik Lorenz
Yam Levi
Zion English
Vikram S. Voleti
Adam Letts
Varun Jampani
Robin Rombach
VGen
Re-assign community
ArXiv (abs)
PDF
HTML
HuggingFace (13 upvotes)
Github (25943★)
Papers citing
"Stable Video Diffusion: Scaling Latent Video Diffusion Models to Large Datasets"
50 / 978 papers shown
Title
MorphoSim: An Interactive, Controllable, and Editable Language-guided 4D World Simulator
Xuehai He
Shijie Zhou
Thivyanth Venkateswaran
Kaizhi Zheng
Ziyu Wan
A. Kadambi
Xin Eric Wang
VGen
SyDa
AI4CE
160
0
0
05 Oct 2025
Scaling Sequence-to-Sequence Generative Neural Rendering
Shikun Liu
Kam Woh Ng
Wonbong Jang
Jiadong Guo
Junlin Han
...
Juan C. Pérez
Zijian Zhou
Chi Phung
Tao Xiang
Juan-Manuel Perez-Rua
VGen
113
0
0
05 Oct 2025
World-To-Image: Grounding Text-to-Image Generation with Agent-Driven World Knowledge
Moo Hyun Son
Jintaek Oh
Sun Bin Mun
Jaechul Roh
Sehyun Choi
120
0
0
05 Oct 2025
Joint Learning of Pose Regression and Denoising Diffusion with Score Scaling Sampling for Category-level 6D Pose Estimation
Seunghyun Lee
Tae-Kyun Kim
DiffM
226
0
0
05 Oct 2025
When and Where do Events Switch in Multi-Event Video Generation?
Ruotong Liao
Guowen Huang
Qing Cheng
Thomas Seidl
Daniel Cremers
Volker Tresp
DiffM
VGen
204
0
0
03 Oct 2025
Taming Text-to-Sounding Video Generation via Advanced Modality Condition and Interaction
Kaisi Guan
Xihua Wang
Zhengfeng Lai
Xin Cheng
Peng Zhang
Xiaojiang Liu
Ruihua Song
Meng Cao
DiffM
256
4
0
03 Oct 2025
UniVerse: Unleashing the Scene Prior of Video Diffusion Models for Robust Radiance Field Reconstruction
Jin Cao
Hongrui Wu
Ziyong Feng
Hujun Bao
Xiaowei Zhou
Sida Peng
VGen
153
0
0
02 Oct 2025
Fine-Grained GRPO for Precise Preference Alignment in Flow Models
Yujie Zhou
Pengyang Ling
Jiazi Bu
Yibin Wang
Yuhang Zang
Jiaqi Wang
Li Niu
Guangtao Zhai
DiffM
213
3
0
02 Oct 2025
Self-Forcing++: Towards Minute-Scale High-Quality Video Generation
Justin Cui
Jie Wu
Ming Li
Tao Yang
Xiaojie Li
Rui Wang
Andrew Bai
Yuanhao Ban
Cho-Jui Hsieh
DiffM
VGen
213
27
0
02 Oct 2025
Learning to Generate Rigid Body Interactions with Video Diffusion Models
David Romero
Ariana Bermúdez
Hao Li
Fabio Pizzati
Ivan Laptev
DiffM
VGen
432
0
0
02 Oct 2025
FreeViS: Training-free Video Stylization with Inconsistent References
Jiacong Xu
Yiqun Mei
Ke Zhang
Vishal M. Patel
DiffM
VGen
198
2
0
02 Oct 2025
Arbitrary Generative Video Interpolation
Guozhen Zhang
Haiguang Wang
C. Wang
Yuan Zhou
Qinglin Lu
Limin Wang
VGen
143
0
0
01 Oct 2025
LVTINO: LAtent Video consisTency INverse sOlver for High Definition Video Restoration
Alessio Spagnoletti
Andrés Almansa
Marcelo Pereyra
DiffM
VGen
171
0
0
01 Oct 2025
VLA-RFT: Vision-Language-Action Reinforcement Fine-tuning with Verified Rewards in World Simulators
Hengtao Li
Pengxiang Ding
Runze Suo
Yihao Wang
Zirui Ge
...
Kexian Yu
Mingyang Sun
Hongyin Zhang
Donglin Wang
Weihua Su
136
6
0
01 Oct 2025
InfVSR: Breaking Length Limits of Generic Video Super-Resolution
Ziqing Zhang
Kai Liu
Zheng Chen
X. Li
Yihao Chen
Bingnan Duan
Linghe Kong
Yulun Zhang
151
2
0
01 Oct 2025
EvoWorld: Evolving Panoramic World Generation with Explicit 3D Memory
Jiahao Wang
Luoxin Ye
Taiming Lu
Junfei Xiao
Jiahan Zhang
...
Xijun Liu
Rama Chellappa
Cheng-Fang Peng
Alan Yuille
Jieneng Chen
VGen
129
2
0
01 Oct 2025
BindWeave: Subject-Consistent Video Generation via Cross-Modal Integration
Zhaoyang Li
Dongjun Qian
Kai Su
Qishuai Diao
Xiangyang Xia
Chang Liu
Wenfei Yang
Tianzhu Zhang
Zehuan Yuan
DiffM
VGen
125
2
0
01 Oct 2025
Can World Models Benefit VLMs for World Dynamics?
Kevin Zhang
Kuangzhi Ge
Xiaowei Chi
Renrui Zhang
Shaojun Shi
Zhen Dong
Sirui Han
Shanghang Zhang
VGen
VLM
127
4
0
01 Oct 2025
PatchVSR: Breaking Video Diffusion Resolution Limits with Patch-wise Video Super-Resolution
Computer Vision and Pattern Recognition (CVPR), 2025
S. Du
Menghan Xia
Chang Liu
Xintao Wang
Jing Wang
Pengfei Wan
Di Zhang
Xiangyang Ji
DiffM
SupR
VGen
271
3
0
30 Sep 2025
Stable Cinemetrics : Structured Taxonomy and Evaluation for Professional Video Generation
Agneet Chatterjee
Rahim Entezari
Maksym Zhuravinskyi
Maksim Lapin
Reshinth Adithyan
Amit Raj
Chitta Baral
Yezhou Yang
Varun Jampani
DiffM
EGVM
VGen
137
0
0
30 Sep 2025
MotionRAG: Motion Retrieval-Augmented Image-to-Video Generation
Chenhui Zhu
Yilu Wu
Shuai Wang
Gangshan Wu
Limin Wang
DiffM
VGen
123
1
0
30 Sep 2025
FlashI2V: Fourier-Guided Latent Shifting Prevents Conditional Image Leakage in Image-to-Video Generation
Yunyang Ge
Xinhua Cheng
ChengShu Zhao
Xianyi He
Shenghai Yuan
Bin Lin
Bin Zhu
Li Yuan
VGen
VLM
188
0
0
29 Sep 2025
Enhancing Physical Plausibility in Video Generation by Reasoning the Implausibility
Yutong Hao
Chen Chen
Ajmal Saeed Mian
Chang Xu
Daochang Liu
DiffM
VGen
136
3
0
29 Sep 2025
UniLat3D: Geometry-Appearance Unified Latents for Single-Stage 3D Generation
Guanjun Wu
Jiemin Fang
Chen Yang
Sikuang Li
Taoran Yi
...
Xiaopeng Zhang
Wei Wei
Wenyu Liu
Xinggang Wang
Qi Tian
164
2
0
29 Sep 2025
Attention Surgery: An Efficient Recipe to Linearize Your Video Diffusion Transformer
Mohsen Ghafoorian
Denis Korzhenkov
A. Habibian
VGen
303
4
0
29 Sep 2025
World-Env: Leveraging World Model as a Virtual Environment for VLA Post-Training
Junjin Xiao
Y. Yang
Xinyuan Chang
Ronghan Chen
Feng Xiong
Mu Xu
Wei-Shi Zheng
Qing Zhang
VLM
257
7
0
29 Sep 2025
Asymmetric VAE for One-Step Video Super-Resolution Acceleration
Jianze Li
Yong Guo
Yulun Zhang
Xiaokang Yang
DiffM
94
0
0
29 Sep 2025
PanoWorld-X: Generating Explorable Panoramic Worlds via Sphere-Aware Video Diffusion
Yuyang Yin
HaoXiang Guo
Fangfu Liu
Mengyu Wang
Hanwen Liang
Eric Li
Yikai Wang
Xiaojie Jin
Yao-Min Zhao
Yunchao Wei
VGen
101
0
0
29 Sep 2025
RapidMV: Leveraging Spatio-Angular Representations for Efficient and Consistent Text-to-Multi-View Synthesis
Seungwook Kim
Yichun Shi
Kejie Li
Minsu Cho
Peng Wang
136
0
0
29 Sep 2025
Rolling Forcing: Autoregressive Long Video Diffusion in Real Time
Kunhao Liu
Wenbo Hu
Jiale Xu
Ying Shan
Shijian Lu
DiffM
VGen
150
22
0
29 Sep 2025
DC-VideoGen: Efficient Video Generation with Deep Compression Video Autoencoder
Junyu Chen
Wenkun He
Yuchao Gu
Yuyang Zhao
Jincheng Yu
...
Haocheng Xi
Ligeng Zhu
Enze Xie
Song Han
Han Cai
VGen
170
2
0
29 Sep 2025
UniVid: The Open-Source Unified Video Model
Jiabin Luo
Junhui Lin
Zeyu Zhang
Biao Wu
Meng Fang
Ling-Hao Chen
Hao Tang
VGen
272
7
0
29 Sep 2025
GaussianLens: Localized High-Resolution Reconstruction via On-Demand Gaussian Densification
Yijia Weng
Zhicheng Wang
Songyou Peng
Saining Xie
Howard Zhou
Leonidas Guibas
148
0
0
29 Sep 2025
ReLumix: Extending Image Relighting to Video via Video Diffusion Models
Lezhong Wang
Shutong Jin
Ruiqi Cui
Anders Bjorholm Dahl
J. Frisvad
Siavash Bigdeli
VGen
101
0
0
28 Sep 2025
Vid-Freeze: Protecting Images from Malicious Image-to-Video Generation via Temporal Freezing
Rohit Chowdhury
Aniruddha Bala
Rohan Jaiswal
Siddharth Roheda
AAML
VGen
95
0
0
27 Sep 2025
Syncphony: Synchronized Audio-to-Video Generation with Diffusion Transformers
Jibin Song
Mingi Kwon
Jaeseok Jeong
Youngjung Uh
DiffM
VGen
1.4K
0
0
26 Sep 2025
EMMA: Generalizing Real-World Robot Manipulation via Generative Visual Transfer
Zhehao Dong
Xiaofeng Wang
Zheng Zhu
Y. Wang
Yang Wang
...
Runqi Ouyang
Wenkang Qin
Xinze Chen
Yun Ye
Guan Huang
VGen
LM&Ro
145
4
0
26 Sep 2025
HiGS: History-Guided Sampling for Plug-and-Play Enhancement of Diffusion Models
Seyedmorteza Sadat
Farnood Salehi
Romann M. Weber
DiffM
160
0
0
26 Sep 2025
Drag4D: Align Your Motion with Text-Driven 3D Scene Generation
Minjun Kang
Inkyu Shin
Taeyeop Lee
In So Kweon
KuK-Jin Yoon
117
0
0
26 Sep 2025
EgoDemoGen: Novel Egocentric Demonstration Generation Enables Viewpoint-Robust Manipulation
Yuan Xu
Jiabing Yang
X. Wang
Yixiang Chen
Zheng Zhu
...
Shuo Lu
Jing Liu
Nianfeng Liu
Yan Huang
Liang Wang
VGen
132
3
0
26 Sep 2025
MoWM: Mixture-of-World-Models for Embodied Planning via Latent-to-Pixel Feature Modulation
Yu Shang
Yangcheng Yu
Xin Zhang
Xin Jin
Haisheng Su
Wei Wu
Yong Li
VGen
167
1
0
26 Sep 2025
NewtonGen: Physics-Consistent and Controllable Text-to-Video Generation via Neural Newtonian Dynamics
Yu Yuan
Xijun Wang
Tharindu Wickremasinghe
Zeeshan Nadir
Bole Ma
Stanley H. Chan
DiffM
VGen
PINN
1.5K
8
0
25 Sep 2025
X-Streamer: Unified Human World Modeling with Audiovisual Interaction
You Xie
Tianpei Gu
Zenan Li
Chenxu Zhang
Guoxian Song
Xiaochen Zhao
C. Liang
Jianwen Jiang
Hongyi Xu
Linjie Luo
VGen
173
3
0
25 Sep 2025
What Happens Next? Anticipating Future Motion by Generating Point Trajectories
Gabrijel Boduljak
Laurynas Karazija
Iro Laina
Christian Rupprecht
Andrea Vedaldi
VGen
112
1
0
25 Sep 2025
PhysCtrl: Generative Physics for Controllable and Physics-Grounded Video Generation
Chen Wang
Chuhao Chen
Yiming Huang
Zhiyang Dou
Yuan Liu
Jiatao Gu
Lingjie Liu
DiffM
VGen
PINN
607
8
0
24 Sep 2025
Frame-based Equivariant Diffusion Models for 3D Molecular Generation
Mohan Guo
Cong Liu
Patrick Forré
DiffM
161
1
0
23 Sep 2025
Lyra: Generative 3D Scene Reconstruction via Video Diffusion Model Self-Distillation
Sherwin Bahmani
Tianchang Shen
Jiawei Ren
Jiahui Huang
Yifeng Jiang
...
Zan Gojcic
Sanja Fidler
Huan Ling
Jun Gao
Xuanchi Ren
VGen
148
6
0
23 Sep 2025
OmniInsert: Mask-Free Video Insertion of Any Reference via Diffusion Transformer Models
Jinshu Chen
Xinghui Li
Xu Bai
Tianxiang Ma
Pengze Zhang
...
Gen Li
Lijie Liu
Songtao Zhao
Bingchuan Li
Qian He
DiffM
VGen
152
1
0
22 Sep 2025
I2VWM: Robust Watermarking for Image to Video Generation
Guanjie Wang
Zehua Ma
Han Fang
Weiming Zhang
WIGM
VGen
195
0
0
22 Sep 2025
Seg4Diff: Unveiling Open-Vocabulary Segmentation in Text-to-Image Diffusion Transformers
Chaehyun Kim
Heeseong Shin
Eunbeen Hong
Heeji Yoon
Anurag Arnab
Paul Hongsuck Seo
Sunghwan Hong
Seungryong Kim
176
6
0
22 Sep 2025
Previous
1
2
3
4
5
...
18
19
20
Next