Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
2311.15127
Cited By
Stable Video Diffusion: Scaling Latent Video Diffusion Models to Large Datasets
25 November 2023
A. Blattmann
Tim Dockhorn
Sumith Kulal
Daniel Mendelevitch
Maciej Kilian
Dominik Lorenz
Yam Levi
Zion English
Vikram S. Voleti
Adam Letts
Varun Jampani
Robin Rombach
VGen
Re-assign community
ArXiv (abs)
PDF
HTML
HuggingFace (13 upvotes)
Github (25943★)
Papers citing
"Stable Video Diffusion: Scaling Latent Video Diffusion Models to Large Datasets"
50 / 967 papers shown
Title
Discriminator-Free Direct Preference Optimization for Video Diffusion
Haoran Cheng
Qide Dong
Liang Peng
Zhizhou Sha
Weiguo Feng
Jinghui Xie
Zhao Song
Shilei Wen
Xiaofei He
Boxi Wu
VGen
798
2
0
11 Apr 2025
In-2-4D: Inbetweening from Two Single-View Images to 4D Generation
Sauradip Nag
Daniel Cohen-Or
Hao Zhang
Ali Mahdavi-Amiri
DiffM
VGen
406
4
0
11 Apr 2025
Beyond the Frame: Generating 360° Panoramic Videos from Perspective Videos
Rundong Luo
Matthew Wallingford
Ali Farhadi
Noah Snavely
Wei-Chiu Ma
VGen
358
5
0
10 Apr 2025
Geo4D: Leveraging Video Generators for Geometric 4D Scene Reconstruction
Zeren Jiang
Chuanxia Zheng
Iro Laina
Diane Larlus
Andrea Vedaldi
VGen
372
35
0
10 Apr 2025
FlashDepth: Real-time Streaming Video Depth Estimation at 2K Resolution
Gene Chou
Wenqi Xian
Guandao Yang
Mohamed Abdelfattah
Bharath Hariharan
Noah Snavely
Ning Yu
P. Debevec
MDE
427
3
0
09 Apr 2025
DyDiT++: Dynamic Diffusion Transformers for Efficient Visual Generation
Wangbo Zhao
Yizeng Han
Jiasheng Tang
Xiaojiang Peng
Hao Luo
Yibing Song
Gao Huang
Fan Wang
Yang You
492
3
0
09 Apr 2025
RAGME: Retrieval Augmented Video Generation for Enhanced Motion Realism
International Conference on Multimedia Retrieval (ICMR), 2025
E. Peruzzo
Dejia Xu
Xingqian Xu
Humphrey Shi
Andrii Zadaianchuk
DiffM
VGen
275
2
0
09 Apr 2025
EIDT-V: Exploiting Intersections in Diffusion Trajectories for Model-Agnostic, Zero-Shot, Training-Free Text-to-Video Generation
Computer Vision and Pattern Recognition (CVPR), 2025
Diljeet Jagpal
Xi Chen
Vinay P. Namboodiri
DiffM
VGen
135
0
0
09 Apr 2025
POMATO: Marrying Pointmap Matching with Temporal Motion for Dynamic 3D Reconstruction
Songyan Zhang
Yongtao Ge
Jinyuan Tian
Guangkai Xu
Hao Chen
Chen Lv
Chunhua Shen
3DPC
266
14
0
08 Apr 2025
Gaussian Mixture Flow Matching Models
Hansheng Chen
Kai Zhang
Hao Tan
Zexiang Xu
Fujun Luan
Leonidas Guibas
Gordon Wetzstein
Sai Bi
DiffM
419
7
0
07 Apr 2025
FantasyTalking: Realistic Talking Portrait Generation via Coherent Motion Synthesis
Mengchao Wang
Qiang Wang
Fan Jiang
Yaqi Fan
Yunpeng Zhang
Yonggang Qi
Kun Zhao
Mu Xu
DiffM
VGen
180
39
0
07 Apr 2025
Multi-identity Human Image Animation with Structural Video Diffusion
Zhenzhi Wang
Yongqian Li
Yanhong Zeng
Yuwei Guo
Dahua Lin
Tianfan Xue
Bo Dai
VGen
209
4
0
05 Apr 2025
DiTaiListener: Controllable High Fidelity Listener Video Generation with Diffusion
Maksim Siniukov
Di Chang
Minh Tran
Hongkun Gong
Ashutosh Chaubey
Mohammad Soleymani
DiffM
VGen
290
2
0
05 Apr 2025
Can You Count to Nine? A Human Evaluation Benchmark for Counting Limits in Modern Text-to-Video Models
Xuyang Guo
Zekai Huang
Jiayan Huo
Yingyu Liang
Zhenmei Shi
Zhao Song
Jiahao Zhang
ALM
VGen
450
12
0
05 Apr 2025
HumanDreamer-X: Photorealistic Single-image Human Avatars Reconstruction via Gaussian Restoration
Boyuan Wang
Runqi Ouyang
Xiaofeng Wang
Zheng Zhu
Guosheng Zhao
...
X. Zhang
Guan Huang
Xingang Wang
Lihong Liu
Xingang Wang
3DGS
598
8
0
04 Apr 2025
SkyReels-A2: Compose Anything in Video Diffusion Transformers
Zhengcong Fei
Didong Li
Di Qiu
Jiadong Wang
Yikun Dou
...
Jinfeng Xu
Mingyuan Fan
Guibin Chen
Yang Li
Yahui Zhou
DiffM
VGen
300
31
0
03 Apr 2025
Audio-visual Controlled Video Diffusion with Masked Selective State Spaces Modeling for Natural Talking Head Generation
Fa-Ting Hong
Zunnan Xu
Zixiang Zhou
Zhiqiang Zhang
Xiu Li
Qin Lin
Qinglin Lu
D. Xu
DiffM
VGen
423
8
0
03 Apr 2025
MG-Gen: Single Image to Motion Graphics Generation
Takahiro Shirakawa
Tomoyuki Suzuki
Takuto Narumoto
Daichi Haraguchi
VGen
515
0
0
03 Apr 2025
OmniCam: Unified Multimodal Video Generation via Camera Control
Xiaoda Yang
Jiayang Xu
Kaixuan Luan
Xinyu Zhan
Hongshun Qiu
...
Shuai Yang
Li Zhang
Checheng Yu
Cewu Lu
Lixin Yang
DiffM
VGen
256
3
0
03 Apr 2025
Scene Splatter: Momentum 3D Scene Generation from Single Image with Video Diffusion Model
Computer Vision and Pattern Recognition (CVPR), 2025
Shengjun Zhang
Jinzhao Li
Xin Fei
Hao Liu
Yueqi Duan
DiffM
3DGS
VGen
247
5
0
03 Apr 2025
Unified World Models: Coupling Video and Action Diffusion for Pretraining on Large Robotic Datasets
Chuning Zhu
Raymond Yu
S. Feng
Benjamin Burchfiel
Paarth Shah
Abhishek Gupta
VGen
437
40
0
03 Apr 2025
Domain Guidance: A Simple Transfer Approach for a Pre-trained Diffusion Model
International Conference on Learning Representations (ICLR), 2025
Jincheng Zhong
Xiangcheng Zhang
Chao Guo
Mingsheng Long
222
3
0
02 Apr 2025
FlowR: Flowing from Sparse to Dense 3D Reconstructions
Tobias Fischer
Samuel Rota Buló
Yung-Hsu Yang
Nikhil Varma Keetha
Lorenzo Porzi
Norman Muller
Katja Schwarz
Jonathon Luiten
Marc Pollefeys
Peter Kontschieder
3DGS
322
7
0
02 Apr 2025
GeometryCrafter: Consistent Geometry Estimation for Open-world Videos with Diffusion Priors
Tian-Xing Xu
Xiangjun Gao
Wenbo Hu
Xiaoyu Li
Song-Hai Zhang
Mingyu Ding
VGen
MDE
404
17
0
01 Apr 2025
Beyond Static Scenes: Camera-controllable Background Generation for Human Motion
Mingshuai Yao
Mengting Chen
Qinye Zhou
Yujiao Shi
Ming-Yu Liu
...
Chen Ju
Shuai Xiao
Qingwen Liu
Jinsong Lan
Wangmeng Zuo
DiffM
VGen
334
3
0
01 Apr 2025
Can Test-Time Scaling Improve World Foundation Model?
Wenyan Cong
Hanqing Zhu
Peihao Wang
Bangya Liu
Dejia Xu
Kevin Wang
David Z. Pan
Yan Wang
Zhiwen Fan
Ziyi Wang
261
7
0
31 Mar 2025
HumanDreamer: Generating Controllable Human-Motion Videos via Decoupled Generation
Computer Vision and Pattern Recognition (CVPR), 2025
Boyuan Wang
Xiaofeng Wang
Chaojun Ni
Guosheng Zhao
Zhiqin Yang
...
Yukun Zhou
Xinze Chen
Guan Huang
Lihong Liu
Xingang Wang
VGen
340
16
0
31 Mar 2025
JointTuner: Appearance-Motion Adaptive Joint Training for Customized Video Generation
Fangda Chen
Shanshan Zhao
Chuanfu Xu
Long Lan
VGen
319
3
0
31 Mar 2025
VideoGen-Eval: Agent-based System for Video Generation Evaluation
Yuhang Yang
Ke Fan
Siyang Song
Hongxiang Li
Ailing Zeng
FeiLin Han
Wei-dong Zhai
Wen Liu
Yang Cao
Zheng-jun Zha
EGVM
VGen
357
7
0
30 Mar 2025
SketchVideo: Sketch-based Video Generation and Editing
Computer Vision and Pattern Recognition (CVPR), 2025
Feng-Lin Liu
Hongbo Fu
Xintao Wang
Weicai Ye
Pengfei Wan
Di Zhang
Lin Gao
DiffM
VGen
270
5
0
30 Mar 2025
MoCha: Towards Movie-Grade Talking Character Synthesis
Cong Wei
Bo Sun
Haoyu Ma
Ji Hou
F. Xu
...
Kunpeng Li
Tingbo Hou
Animesh Sinha
Peter Vajda
Lei Ma
VGen
750
19
0
30 Mar 2025
Zero4D: Training-Free 4D Video Generation From Single Video Using Off-the-Shelf Video Diffusion
Jangho Park
Taesung Kwon
Jong Chul Ye
VGen
429
9
0
28 Mar 2025
DiTFastAttnV2: Head-wise Attention Compression for Multi-Modality Diffusion Transformers
Hao Zhang
R. Su
Zhihang Yuan
Pengtao Chen
Mingzhu Shen Yibo Fan
Shengen Yan
Guohao Dai
Yu Wang
255
9
0
28 Mar 2025
Semantix: An Energy Guided Sampler for Semantic Style Transfer
International Conference on Learning Representations (ICLR), 2025
Huiang He
Minghui Hu
C. Zheng
Chaoyue Wang
Tat-Jen Cham
DiffM
232
1
0
28 Mar 2025
EchoFlow: A Foundation Model for Cardiac Ultrasound Image and Video Generation
Hadrien Reynaud
Alberto Gomez
Paul Leeson
Qingjie Meng
Bernhard Kainz
MedIm
151
3
0
28 Mar 2025
DynamiCtrl: Rethinking the Basic Structure and the Role of Text for High-quality Human Image Animation
Haoyu Zhao
Zhongang Qi
Cong Wang
Qingping Zheng
Guansong Lu
Fei Chen
Hang Xu
Zuxuan Wu
DiffM
VGen
257
2
0
27 Mar 2025
VBench-2.0: Advancing Video Generation Benchmark Suite for Intrinsic Faithfulness
Dian Zheng
Ziqi Huang
Hongbo Liu
Kai Zou
Yinan He
...
Jingwen He
Wei-Shi Zheng
Botian Shi
Yu Qiao
Ziwei Liu
EGVM
VGen
281
76
0
27 Mar 2025
GenFusion: Closing the Loop between Reconstruction and Generation via Videos
Computer Vision and Pattern Recognition (CVPR), 2025
Sibo Wu
Congrong Xu
Binbin Huang
Andreas Geiger
Anpei Chen
VGen
988
15
0
27 Mar 2025
TransDiffSBDD: Causality-Aware Multi-Modal Structure-Based Drug Design
Xiuyuan Hu
Guoqing Liu
Can Chen
Yang Zhao
Hao Zhang
Xue Liu
254
3
0
26 Mar 2025
Video Motion Graphs
Haiyang Liu
Zhan Xu
Fa-Ting Hong
Hsin-Ping Huang
Yi Zhou
Yang Zhou
DiffM
VGen
317
2
0
26 Mar 2025
FB-4D: Spatial-Temporal Coherent Dynamic 3D Content Generation with Feature Banks
Jinwei Li
Huan-ang Gao
Wenyi Li
Haohan Chi
Chenyu Liu
...
Yao Yao
Jingwei Zhao
Hongyang Li
Yikai Wang
Hao Zhao
323
2
0
26 Mar 2025
Unconditional Priors Matter! Improving Conditional Generation of Fine-Tuned Diffusion Models
Prin Phunyaphibarn
Phillip Y. Lee
Jaihoon Kim
Minhyuk Sung
DiffM
437
5
0
26 Mar 2025
EGVD: Event-Guided Video Diffusion Model for Physically Realistic Large-Motion Frame Interpolation
Ziran Zhang
Xiaohui Li
Yihao Liu
Yujin Wang
Yueting Chen
Tianfan Xue
Shi Guo
DiffM
VGen
236
2
0
26 Mar 2025
VPO: Aligning Text-to-Video Generation Models with Prompt Optimization
Jiale Cheng
Ruiliang Lyu
Xiaohan Zhang
Xiao-Chang Liu
Jiazheng Xu
...
Zhuoyi Yang
Yuxiao Dong
Jie Tang
Han Wang
Minlie Huang
VGen
245
12
0
26 Mar 2025
AudCast: Audio-Driven Human Video Generation by Cascaded Diffusion Transformers
Computer Vision and Pattern Recognition (CVPR), 2025
Jiazhi Guan
Kaisiyuan Wang
Zhiliang Xu
Quanwei Yang
Yasheng Sun
...
Errui Ding
Jiadong Wang
Youjian Zhao
Hang Zhou
Ziwei Liu
VGen
232
1
0
25 Mar 2025
Mask
2
^2
2
DiT: Dual Mask-based Diffusion Transformer for Multi-Scene Long Video Generation
Computer Vision and Pattern Recognition (CVPR), 2025
Tianhao Qi
Jianlong Yuan
Wanquan Feng
Shancheng Fang
Jiawei Liu
Siyu Zhou
Qian He
Hongtao Xie
Yongdong Zhang
DiffM
VGen
232
8
0
25 Mar 2025
Tracktention: Leveraging Point Tracking to Attend Videos Faster and Better
Computer Vision and Pattern Recognition (CVPR), 2025
Zihang Lai
Andrea Vedaldi
181
3
0
25 Mar 2025
EfficientMT: Efficient Temporal Adaptation for Motion Transfer in Text-to-Video Diffusion Models
Yufei Cai
Hu Han
Yuxiang Wei
Shiguang Shan
Xilin Chen
DiffM
VGen
211
1
0
25 Mar 2025
MVPortrait: Text-Guided Motion and Emotion Control for Multi-view Vivid Portrait Animation
Computer Vision and Pattern Recognition (CVPR), 2025
Yukang Lin
Hokit Fung
Jianjin Xu
Zeping Ren
Adela S.M. Lau
Guosheng Yin
Xiu Li
VGen
269
12
0
25 Mar 2025
PartRM: Modeling Part-Level Dynamics with Large Cross-State Reconstruction Model
Computer Vision and Pattern Recognition (CVPR), 2025
Mingju Gao
Yike Pan
Huan-ang Gao
Zongzheng Zhang
Wenyi Li
Hao Dong
Hao Tang
Li Yi
Hao Zhao
VGen
221
6
0
25 Mar 2025
Previous
1
2
3
...
10
11
12
...
18
19
20
Next