Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
2311.15127
Cited By
Stable Video Diffusion: Scaling Latent Video Diffusion Models to Large Datasets
25 November 2023
A. Blattmann
Tim Dockhorn
Sumith Kulal
Daniel Mendelevitch
Maciej Kilian
Dominik Lorenz
Yam Levi
Zion English
Vikram S. Voleti
Adam Letts
Varun Jampani
Robin Rombach
VGen
Re-assign community
ArXiv (abs)
PDF
HTML
HuggingFace (13 upvotes)
Github (25943★)
Papers citing
"Stable Video Diffusion: Scaling Latent Video Diffusion Models to Large Datasets"
50 / 967 papers shown
Title
ReconDreamer++: Harmonizing Generative and Reconstructive Models for Driving Scene Representation
Guosheng Zhao
Xiaofeng Wang
Chaojun Ni
Zheng Zhu
Wenkang Qin
Guan Huang
Xingang Wang
381
17
0
24 Mar 2025
AdaWorld: Learning Adaptable World Models with Latent Actions
Shenyuan Gao
Siyuan Zhou
Yilun Du
Jun Zhang
Chuang Gan
VGen
513
31
0
24 Mar 2025
InPO: Inversion Preference Optimization with Reparametrized DDIM for Efficient Diffusion Model Alignment
Computer Vision and Pattern Recognition (CVPR), 2025
Yaojie Lu
Qichao Wang
H. Cao
Xierui Wang
Xiaoyin Xu
Min Zhang
287
6
0
24 Mar 2025
Video-T1: Test-Time Scaling for Video Generation
Fan Liu
Hanyang Wang
Yimo Cai
Kaiyan Zhang
Xiaohang Zhan
Yueqi Duan
DiffM
VGen
402
15
0
24 Mar 2025
EvAnimate: Event-conditioned Image-to-Video Generation for Human Animation
Qiang Qu
Ming Li
Xiaoming Chen
Tongliang Liu
DiffM
VGen
279
2
0
24 Mar 2025
Target-Aware Video Diffusion Models
Taeksoo Kim
Hanbyul Joo
DiffM
VGen
390
3
0
24 Mar 2025
HunyuanPortrait: Implicit Condition Control for Enhanced Portrait Animation
Computer Vision and Pattern Recognition (CVPR), 2025
Zunnan Xu
Zhentao Yu
Zixiang Zhou
Jun Zhou
Xiaoyu Jin
...
Chengfei Cai
Shiyu Tang
Qin Lin
Xiu Li
Qinglin Lu
DiffM
VGen
397
29
0
24 Mar 2025
Aether: Geometric-Aware Unified World Modeling
Aether Team
Haoyi Zhu
Yanjie Wang
Jianjun Zhou
Wenzheng Chang
...
Zizun Li
Junyi Chen
Chunhua Shen
Jiangmiao Pang
Tong He
DiffM
VGen
449
40
0
24 Mar 2025
ETVA: Evaluation of Text-to-Video Alignment via Fine-grained Question Generation and Answering
Kaisi Guan
Zhengfeng Lai
Yizhou Sun
Peng Zhang
Wei Liu
Kieran Liu
Meng Cao
Ruihua Song
VGen
290
3
0
21 Mar 2025
Generating, Fast and Slow: Scalable Parallel Video Generation with Video Interface Networks
Bhishma Dedhia
David Bourgin
Krishna Kumar Singh
Yuheng Li
Yan Kang
Zhan Xu
N. Jha
Yixiao Liu
DiffM
VGen
332
0
0
21 Mar 2025
Decouple and Track: Benchmarking and Improving Video Diffusion Transformers for Motion Transfer
Qingyu Shi
Jianzong Wu
Jinbin Bai
Jing Zhang
Lu Qi
Xuelong Li
Yunhai Tong
231
4
0
21 Mar 2025
TruthLens: Visual Grounding for Universal DeepFake Reasoning
Rohit Kundu
Shan Jia
Vishal Mohanty
Athula Balachandran
Amit K. Roy-Chowdhury
348
3
0
20 Mar 2025
VideoRFSplat: Direct Scene-Level Text-to-3D Gaussian Splatting Generation with Flexible Pose and Multi-View Joint Modeling
Hyojun Go
Byeongjun Park
Hyelin Nam
Byung-Hoon Kim
Hyungjin Chung
Changick Kim
3DGS
VGen
365
7
0
20 Mar 2025
SV4D 2.0: Enhancing Spatio-Temporal Consistency in Multi-View Video Diffusion for High-Quality 4D Generation
Chun-Han Yao
Yiming Xie
Vikram S. Voleti
Huaizu Jiang
Varun Jampani
3DGS
VGen
466
20
0
20 Mar 2025
ScalingNoise: Scaling Inference-Time Search for Generating Infinite Videos
Haolin Yang
Feilong Tang
Ming Hu
Yulong Li
Junjie Guo
...
Zelin Peng
Junjun He
Junjun He
Zongyuan Ge
Imran Razzak
DiffM
VGen
764
7
0
20 Mar 2025
EDEN: Enhanced Diffusion for High-quality Large-motion Video Frame Interpolation
Computer Vision and Pattern Recognition (CVPR), 2025
Zihao Zhang
Haoran Chen
Haoyu Zhao
Guansong Lu
Yanwei Fu
Hang Xu
Zuxuan Wu
VGen
DiffM
346
7
0
20 Mar 2025
PoseTraj: Pose-Aware Trajectory Control in Video Diffusion
Computer Vision and Pattern Recognition (CVPR), 2025
Longbin Ji
Lei Zhong
Pengfei Wei
Changjian Li
DiffM
VGen
223
3
0
20 Mar 2025
BlockDance: Reuse Structurally Similar Spatio-Temporal Features to Accelerate Diffusion Transformers
Computer Vision and Pattern Recognition (CVPR), 2025
Hui Zhang
Tingwei Gao
Jie Shao
Zuxuan Wu
304
9
0
20 Mar 2025
Animating the Uncaptured: Humanoid Mesh Animation with Video Diffusion Models
Marc Benedí San Millán
Angela Dai
Matthias Nießner
DiffM
253
3
0
20 Mar 2025
Zero-1-to-A: Zero-Shot One Image to Animatable Head Avatars Using Video Diffusion
Computer Vision and Pattern Recognition (CVPR), 2025
Zhou Zhenglin
Ma Fan
Fan Hehe
Chua Tat-Seng
VGen
478
1
0
20 Mar 2025
MiLA: Multi-view Intensive-fidelity Long-term Video Generation World Model for Autonomous Driving
Haiguang Wang
Daqi Liu
Hongwei Xie
Haisong Liu
Enhui Ma
Kaicheng Yu
Limin Wang
Bing Wang
VGen
277
4
0
20 Mar 2025
MagicMotion: Controllable Video Generation with Dense-to-Sparse Trajectory Guidance
Quanhao Li
Zhen Xing
Rui Wang
Hui Zhang
Jingdong Sun
Zuxuan Wu
VGen
401
15
0
20 Mar 2025
Uncertainty-Aware Diffusion Guided Refinement of 3D Scenes
Sarosij Bose
Arindam Dutta
Sayak Nag
Junge Zhang
Jiachen Li
Konstantinos Karydis
Amit K. Roy-Chowdhury
339
0
0
19 Mar 2025
Efficient Personalization of Quantized Diffusion Model without Backpropagation
Computer Vision and Pattern Recognition (CVPR), 2025
H. Seo
Wongi Jeong
Kyungryeol Lee
Se Young Chun
DiffM
MQ
345
1
0
19 Mar 2025
Temporal Regularization Makes Your Video Generator Stronger
Harold Haodong Chen
Haojian Huang
Xianfeng Wu
Yexin Liu
Yajing Bai
Wen-Jie Shu
Harry Yang
Ser-Nam Lim
VGen
308
7
0
19 Mar 2025
MusicInfuser: Making Video Diffusion Listen and Dance
Susung Hong
Ira Kemelmacher-Shlizerman
Brian L. Curless
Steven M. Seitz
VGen
285
1
0
18 Mar 2025
Advances in 4D Generation: A Survey
Qiaowei Miao
Kehan Li
Jinsheng Quan
Zhiyuan Min
Shaojie Ma
Yichao Xu
Yi Yang
Ping Liu
Yawei Luo
493
2
0
18 Mar 2025
MOSAIC: Generating Consistent, Privacy-Preserving Scenes from Multiple Depth Views in Multi-Room Environments
Zhixuan Liu
H. Zhu
R. Chen
Jonathan M Francis
Soonmin Hwang
Jiangning Zhang
Jean Oh
VGen
1.1K
2
0
18 Mar 2025
SIR-DIFF: Sparse Image Sets Restoration with Multi-View Diffusion Model
Computer Vision and Pattern Recognition (CVPR), 2025
Yucheng Mao
Boyang Wang
Nilesh Kulkarni
Jeong Joon Park
DiffM
296
1
0
18 Mar 2025
Concat-ID: Towards Universal Identity-Preserving Video Synthesis
Yong Zhong
Zhuoyi Yang
Jiayan Teng
Xiaohan Zhang
Chongxuan Li
VGen
336
17
0
18 Mar 2025
Impossible Videos
Zechen Bai
Hai Ci
Mike Zheng Shou
EGVM
VGen
294
7
0
18 Mar 2025
Generative Gaussian Splatting: Generating 3D Scenes with Video Diffusion Priors
Katja Schwarz
Norman Mueller
Peter Kontschieder
3DGS
279
10
0
17 Mar 2025
Seeing the Future, Perceiving the Future: A Unified Driving World Model for Future Generation and Perception
Dingkang Liang
Dingyuan Zhang
Xin Zhou
Sifan Tu
Tianrui Feng
Xiaofan Li
Yumeng Zhang
Mingyang Du
Xiao Tan
Xiang Bai
226
7
0
17 Mar 2025
FiVE: A Fine-grained Video Editing Benchmark for Evaluating Emerging Diffusion and Rectified Flow Models
Minghan Li
C. Xie
Yongpeng Wu
Lei Zhang
Ming Wang
DiffM
VGen
401
6
0
17 Mar 2025
CNCast: Leveraging 3D Swin Transformer and DiT for Enhanced Regional Weather Forecasting
Hongli Liang
Yuanting Zhang
Qingye Meng
Shuangshuang He
Xingyuan Yuan
186
0
0
16 Mar 2025
TACO: Taming Diffusion for in-the-wild Video Amodal Completion
Ruijie Lu
Yixin Chen
Yu Liu
Jiaxiang Tang
Junfeng Ni
Diwen Wan
Gang Zeng
Siyuan Huang
DiffM
VGen
407
8
0
15 Mar 2025
SteerX: Creating Any Camera-Free 3D and 4D Scenes with Geometric Steering
Byeongjun Park
Hyojun Go
Hyelin Nam
Byung-Hoon Kim
Hyungjin Chung
Changick Kim
VGen
LLMSV
356
5
0
15 Mar 2025
MAVFlow: Preserving Paralinguistic Elements with Conditional Flow Matching for Zero-Shot AV2AV Multilingual Translation
Sungwoo Cho
J. Choi
Sungnyun Kim
Se-Young Yun
281
0
0
14 Mar 2025
Harnessing Frequency Spectrum Insights for Image Copyright Protection Against Diffusion Models
Computer Vision and Pattern Recognition (CVPR), 2025
Zhenguang Liu
Chao Shuai
Shaojing Fan
Ziping Dong
Jinwu Hu
Zhongjie Ba
Kui Ren
WIGM
302
0
0
14 Mar 2025
MTV-Inpaint: Multi-Task Long Video Inpainting
Shiyuan Yang
Zheng Gu
Liang Hou
Xin Tao
Pengfei Wan
Xiaodong Chen
Jing Liao
DiffM
167
5
0
14 Mar 2025
Understanding Flatness in Generative Models: Its Role and Benefits
Taehwan Lee
Kyeongkook Seo
Jaejun Yoo
Sung Whan Yoon
DiffM
284
1
0
14 Mar 2025
Step-Video-TI2V Technical Report: A State-of-the-Art Text-Driven Image-to-Video Generation Model
Haoyang Huang
Guoqing Ma
Nan Duan
Xing Chen
Changyi Wan
...
Xiangyu Zhang
Yi Xiu
Yibo Zhu
H. Shum
Daxin Jiang
VGen
181
14
0
14 Mar 2025
ReCamMaster: Camera-Controlled Generative Rendering from A Single Video
Jianhong Bai
Menghan Xia
Xiao Fu
Xintao Wang
Lianrui Mu
...
Zuozhu Liu
Haoji Hu
Xiang Bai
Pengfei Wan
Di Zhang
DiffM
VGen
377
87
0
14 Mar 2025
HiTVideo: Hierarchical Tokenizers for Enhancing Text-to-Video Generation with Autoregressive Large Language Models
Ziqin Zhou
Yifan Yang
Yue Yang
Tianyu He
Houwen Peng
Kai Qiu
Qi Dai
Lili Qiu
Chong Luo
Lingqiao Liu
DiffM
VGen
151
4
0
14 Mar 2025
FasterCache: Training-Free Video Diffusion Model Acceleration with High Quality
International Conference on Learning Representations (ICLR), 2024
Zhengyao Lv
Chenyang Si
Junhao Song
Zhenyu Yang
Ping Luo
Yu Qiao
Kwan-Yee K. Wong
VGen
DiffM
342
44
0
13 Mar 2025
VideoMerge: Towards Training-free Long Video Generation
Siyang Zhang
Harry Yang
Ser-Nam Lim
DiffM
VGen
175
2
0
13 Mar 2025
MuDG: Taming Multi-modal Diffusion with Gaussian Splatting for Urban Scene Reconstruction
Yingshuang Zou
Yikang Ding
Chuanrui Zhang
Jiazhe Guo
Bohan Li
Xiaoyang Lyu
Feiyang Tan
Xiaojuan Qi
Haoqian Wang
3DGS
188
2
0
13 Mar 2025
R
^R
R
FLAV: Rolling Flow matching for infinite Audio Video generation
Alex Ergasti
Giuseppe Tarollo
Filippo Botti
Tomaso Fontanini
Claudio Ferrari
Massimo Bertozzi
Andrea Prati
VGen
192
2
0
13 Mar 2025
Motion Anything: Any to Motion Generation
Zeyu Zhang
Yiran Wang
Wei Mao
Danning Li
Rui Zhao
Biao Wu
Zirui Song
Bohan Zhuang
Ian Reid
Leonid Sigal
DiffM
VGen
225
18
0
13 Mar 2025
V2Edit: Versatile Video Diffusion Editor for Videos and 3D Scenes
Yanming Zhang
Jun-Kun Chen
Jipeng Lyu
Yu-Xiong Wang
DiffM
VGen
277
2
0
13 Mar 2025
Previous
1
2
3
...
11
12
13
...
18
19
20
Next