Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
2311.15127
Cited By
Stable Video Diffusion: Scaling Latent Video Diffusion Models to Large Datasets
25 November 2023
A. Blattmann
Tim Dockhorn
Sumith Kulal
Daniel Mendelevitch
Maciej Kilian
Dominik Lorenz
Yam Levi
Zion English
Vikram S. Voleti
Adam Letts
Varun Jampani
Robin Rombach
VGen
Re-assign community
ArXiv (abs)
PDF
HTML
HuggingFace (13 upvotes)
Github (25943★)
Papers citing
"Stable Video Diffusion: Scaling Latent Video Diffusion Models to Large Datasets"
50 / 967 papers shown
Title
Mobius: Text to Seamless Looping Video Generation via Latent Shift
Xiuli Bi
Jianfei Yuan
Bo Liu
Yanmei Zhang
Xiaodong Cun
Chi-Man Pun
Bin Xiao
DiffM
VGen
159
0
0
27 Feb 2025
FlexiDiT: Your Diffusion Transformer Can Easily Generate High-Quality Samples with Less Compute
Computer Vision and Pattern Recognition (CVPR), 2025
Sotiris Anagnostidis
Gregor Bachmann
Yeongmin Kim
Jonas Kohler
Markos Georgopoulos
A. Sanakoyeu
Yuming Du
Albert Pumarola
Ali K. Thabet
Edgar Schönfeld
315
3
0
27 Feb 2025
High-Fidelity Relightable Monocular Portrait Animation with Lighting-Controllable Video Diffusion Model
Computer Vision and Pattern Recognition (CVPR), 2025
Mingtao Guo
Guanyu Xing
Yanli Liu
DiffM
VGen
215
4
0
27 Feb 2025
TransVDM: Motion-Constrained Video Diffusion Model for Transparent Video Synthesis
IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2025
Menghao Li
Zhenghao Zhang
Junchao Liao
Long Qin
Weizhi Wang
DiffM
VGen
199
1
0
26 Feb 2025
X-Dancer: Expressive Music to Human Dance Video Generation
Zeyuan Chen
Hongyi Xu
Guoxian Song
You Xie
Chenxu Zhang
Xiusi Chen
Chao Wang
Di Chang
Linjie Luo
VGen
289
8
0
24 Feb 2025
PuzzleFusion++: Auto-agglomerative 3D Fracture Assembly by Denoise and Verify
International Conference on Learning Representations (ICLR), 2024
Zhengqing Wang
Jiacheng Chen
Yasutaka Furukawa
332
15
0
24 Feb 2025
RIFLEx: A Free Lunch for Length Extrapolation in Video Diffusion Transformers
Min Zhao
Guande He
Yixiao Chen
Hongzhou Zhu
Chong Li
Jun Zhu
VGen
383
35
0
21 Feb 2025
Text-to-Image Rectified Flow as Plug-and-Play Priors
International Conference on Learning Representations (ICLR), 2024
Xiaofeng Yang
Cheng Chen
Xulei Yang
Fayao Liu
Guosheng Lin
DiffM
348
21
0
21 Feb 2025
Accelerating Diffusion Transformers with Token-wise Feature Caching
International Conference on Learning Representations (ICLR), 2024
Chang Zou
Xuyang Liu
Ting Liu
Siteng Huang
Linfeng Zhang
359
56
0
20 Feb 2025
CAST: Component-Aligned 3D Scene Reconstruction from an RGB Image
ACM Transactions on Graphics (TOG), 2025
Kaixin Yao
Longwen Zhang
Xinhao Yan
Yan Zeng
Qixuan Zhang
Wei Yang
Lan Xu
Jiayuan Gu
Jingyi Yu
349
39
0
18 Feb 2025
VidCapBench: A Comprehensive Benchmark of Video Captioning for Controllable Text-to-Video Generation
Annual Meeting of the Association for Computational Linguistics (ACL), 2025
Xinlong Chen
Yuanxing Zhang
Chongling Rao
Yushuo Guan
Qingbin Liu
Fuzheng Zhang
Chengru Song
Qiang Liu
Di Zhang
Tieniu Tan
286
13
0
18 Feb 2025
SayAnything: Audio-Driven Lip Synchronization with Conditional Video Diffusion
Junxian Ma
Shiwen Wang
Jian Yang
Junyi Hu
Jian Liang
Guosheng Lin
Jingbo Chen
Kai Li
Yu Meng
DiffM
VGen
305
5
0
17 Feb 2025
MaskGWM: A Generalizable Driving World Model with Video Mask Reconstruction
Computer Vision and Pattern Recognition (CVPR), 2025
Jingcheng Ni
Yuxin Guo
Yichen Liu
Rui Chen
Lewei Lu
Z. Wu
DiffM
VGen
277
17
0
17 Feb 2025
Phantom: Subject-consistent video generation via cross-modal alignment
Lijie Liu
Tianxiang Ma
Bingchuan Li
Zhuowei Chen
Jiawei Liu
Qian He
Xinglong Wu
Qian He
Xinglong Wu
DiffM
VGen
411
43
0
16 Feb 2025
Learning Human Skill Generators at Key-Step Levels
Yilu Wu
Chenhui Zhu
Shuai Wang
Hanlin Wang
Jing Wang
Zhaoxiang Zhang
Limin Wang
VGen
346
1
0
12 Feb 2025
History-Guided Video Diffusion
Kiwhan Song
Boyuan Chen
Max Simchowitz
Yilun Du
Russ Tedrake
Vincent Sitzmann
VGen
497
61
0
10 Feb 2025
Pre-Trained Video Generative Models as World Simulators
Haoran He
Yang Zhang
Guanbin Li
Zhihao Xu
Ling Pan
VGen
336
21
0
10 Feb 2025
Animate Anyone 2: High-Fidelity Character Image Animation with Environment Affordance
Li Hu
Guangyuan Wang
Zhen Shen
Xin Gao
Dechao Meng
Lian Zhuo
Peng Zhang
Bang Zhang
Liefeng Bo
DiffM
VGen
343
34
0
10 Feb 2025
VFX Creator: Animated Visual Effect Generation with Controllable Diffusion Transformer
Xinyu Liu
Ailing Zeng
Wei Xue
Harry Yang
Wenhan Luo
Qifeng Liu
Wenhan Luo
VGen
1.0K
7
0
09 Feb 2025
A Physical Coherence Benchmark for Evaluating Video Generation Models via Optical Flow-guided Frame Prediction
Yongfan Chen
Xiuwen Zhu
Tianyu Li
EGVM
VGen
498
3
0
08 Feb 2025
Survey on AI-Generated Media Detection: From Non-MLLM to MLLM
Yueying Zou
Peipei Li
Zekun Li
Huaibo Huang
Xing Cui
Xuannan Liu
Chenghanyu Zhang
Ran He
DeLMO
605
10
0
07 Feb 2025
MotionCanvas: Cinematic Shot Design with Controllable Image-to-Video Generation
Jinbo Xing
Long Mai
Cusuh Ham
Jiahui Huang
Aniruddha Mahapatra
Chi-Wing Fu
T. Wong
Feng Liu
DiffM
VGen
544
21
0
06 Feb 2025
Towards Physical Understanding in Video Generation: A 3D Point Regularization Approach
Yunuo Chen
Junli Cao
Vidit Goel
Sergei Korolev
Sergei Korolev
Jian Ren
Sergey Tulyakov
Jian Ren
DiffM
VGen
334
7
0
05 Feb 2025
MJ-VIDEO: Fine-Grained Benchmarking and Rewarding Video Preferences in Video Generation
Haibo Tong
Zhaoyang Wang
Zhe Chen
Haonian Ji
Shi Qiu
...
Peng Xia
Mingyu Ding
Rafael Rafailov
Chelsea Finn
Huaxiu Yao
EGVM
VGen
599
8
0
03 Feb 2025
Dissecting Submission Limit in Desk-Rejections: A Mathematical Analysis of Fairness in AI Conference Policies
Yuefan Cao
Xiaoyu Li
Yingyu Liang
Zhizhou Sha
Zhenmei Shi
Zhao Song
Jiahao Zhang
287
12
0
02 Feb 2025
Consistent Video Colorization via Palette Guidance
Han Wang
Yuang Zhang
Yuhong Zhang
Lingxiao Lu
Li Song
DiffM
VGen
280
2
0
31 Jan 2025
Improving Tropical Cyclone Forecasting With Video Diffusion Models
Zhibo Ren
Pritthijit Nath
Pancham Shukla
323
0
0
27 Jan 2025
VideoShield: Regulating Diffusion-based Video Generation Models via Watermarking
International Conference on Learning Representations (ICLR), 2025
Runyi Hu
Jing Zhang
You Li
Jiwei Li
Qing Guo
Han Qiu
Tianwei Zhang
WIGM
VGen
432
16
0
24 Jan 2025
Improving Video Generation with Human Feedback
Jie Liu
Gongye Liu
Jiajun Liang
Ziyang Yuan
Xiaokun Liu
...
Fei Yang
Pengfei Wan
Di Zhang
Kun Gai
Yujiu Yang
VGen
EGVM
406
96
0
23 Jan 2025
PreciseCam: Precise Camera Control for Text-to-Image Generation
Computer Vision and Pattern Recognition (CVPR), 2025
Edurne Bernal-Berdun
Ana Serrano
B. Masiá
Matheus Gadelha
Yannick Hold-Geoffroy
Xin Sun
Diego F. F. Gutierrez
DiffM
VGen
187
9
0
22 Jan 2025
Towards Affordance-Aware Articulation Synthesis for Rigged Objects
Yu-Chu Yu
C. Lin
Hsin-Ying Lee
Chaoyang Wang
Longji Xu
Ming-Hsuan Yang
DiffM
AI4CE
255
0
0
21 Jan 2025
GPS as a Control Signal for Image Generation
Computer Vision and Pattern Recognition (CVPR), 2025
Chao Feng
Ziyang Chen
Aleksander Holyñski
Alexei A. Efros
Andrew Owens
DiffM
176
2
0
21 Jan 2025
Video Depth Anything: Consistent Depth Estimation for Super-Long Videos
Computer Vision and Pattern Recognition (CVPR), 2025
Sili Chen
Hengkai Guo
Shengnan Zhu
Feihu Zhang
Zilong Huang
Jiashi Feng
Bingyi Kang
MDE
VLM
AuLLM
532
98
0
21 Jan 2025
Driving in the Occupancy World: Vision-Centric 4D Occupancy Forecasting and Planning via World Models for Autonomous Driving
AAAI Conference on Artificial Intelligence (AAAI), 2024
Yu Yang
Jianbiao Mei
Yukai Ma
Siliang Du
Wenqing Chen
Yijie Qian
Yuxiang Feng
Yong Liu
427
38
0
20 Jan 2025
Joint Learning of Depth and Appearance for Portrait Image Animation
Xinya Ji
Gaspard Zoss
Prashanth Chandran
Lingchen Yang
Xun Cao
B. Solenthaler
D. Bradley
3DH
MDE
309
1
0
15 Jan 2025
BlobGEN-Vid: Compositional Text-to-Video Generation with Blob Video Representations
Computer Vision and Pattern Recognition (CVPR), 2025
Weixi Feng
Chao Liu
Sifei Liu
William Yang Wang
Arash Vahdat
Weili Nie
VGen
DiffM
178
10
0
13 Jan 2025
Qffusion: Controllable Portrait Video Editing via Quadrant-Grid Attention Learning
IEEE Transactions on Visualization and Computer Graphics (TVCG), 2025
Maomao Li
Lijian Lin
Yunfei Liu
Ye Zhu
Yu Li
DiffM
VGen
346
1
0
11 Jan 2025
MEt3R: Measuring Multi-View Consistency in Generated Images
Computer Vision and Pattern Recognition (CVPR), 2025
Mohammad Asim
Christopher Wewer
Thomas Wimmer
Bernt Schiele
J. E. Lenssen
EGVM
3DGS
VGen
223
36
0
10 Jan 2025
Through-The-Mask: Mask-based Motion Trajectories for Image-to-Video Generation
Computer Vision and Pattern Recognition (CVPR), 2025
Guy Yariv
Yuval Kirstain
Amit Zohar
Shelly Sheynin
Yaniv Taigman
Yossi Adi
Sagie Benaim
Adam Polyak
VGen
DiffM
150
9
0
06 Jan 2025
STAR: Spatial-Temporal Augmentation with Text-to-Video Models for Real-World Video Super-Resolution
Rui Xie
Yinhong Liu
Penghao Zhou
Chen Zhao
Jun Zhou
Lucas Beerens
Zhenru Zhang
Jian Yang
Zhiyong Yang
Ying Tai
VGen
DiffM
299
21
0
06 Jan 2025
Visual Large Language Models for Generalized and Specialized Applications
Jiayi Zhang
Zhixin Lai
Wentao Bao
Zhen Tan
Anh Dao
Kewei Sui
Jiayi Shen
Dong Liu
Huan Liu
Yu Kong
VLM
422
32
0
06 Jan 2025
Pointmap-Conditioned Diffusion for Consistent Novel View Synthesis
Thang-Anh-Quan Nguyen
Nathan Piasco
Luis Roldão
Moussâb Bennehar
D. Tsishkou
Laurent Caraffa
J. Tarel
R. Brémond
DiffM
293
3
0
06 Jan 2025
GS-DiT: Advancing Video Generation with Pseudo 4D Gaussian Fields through Efficient Dense 3D Point Tracking
Weikang Bian
Zhaoyang Huang
Xiaoyu Shi
Yijin Li
Fu-Yun Wang
Jiaming Song
3DGS
VGen
DiffM
320
24
0
05 Jan 2025
TDM: Temporally-Consistent Diffusion Model for All-in-One Real-World Video Restoration
Conference on Multimedia Modeling (MMM), 2025
Yizhou Li
Zihua Liu
Yusuke Monno
Masatoshi Okutomi
DiffM
VGen
181
2
0
04 Jan 2025
Towards Precise Scaling Laws for Video Diffusion Transformers
Computer Vision and Pattern Recognition (CVPR), 2024
Yuanyang Yin
Yaqi Zhao
Mingwu Zheng
Ke Lin
Jiarong Ou
...
Pengfei Wan
Di Zhang
Baoqun Yin
Wentao Zhang
Kun Gai
361
9
0
03 Jan 2025
RORem: Training a Robust Object Remover with Human-in-the-Loop
Computer Vision and Pattern Recognition (CVPR), 2025
Ruibin Li
Tao Yang
Song Guo
Guang Dai
389
11
0
01 Jan 2025
AKiRa: Augmentation Kit on Rays for optical video generation
Computer Vision and Pattern Recognition (CVPR), 2024
Xi Wang
Robin Courant
Marc Christie
Vicky Kalogeiton
VGen
379
10
0
31 Dec 2024
DrivingWorld: Constructing World Model for Autonomous Driving via Video GPT
Xiaotao Hu
Wei Yin
Mingkai Jia
Junyuan Deng
Xiaoyang Guo
Qian Zhang
Xiaoxiao Long
Ping Tan
VGen
322
33
0
31 Dec 2024
Edicho: Consistent Image Editing in the Wild
Qingyan Bai
Hao Ouyang
Yinghao Xu
Qiuyu Wang
Ceyuan Yang
Ka Leong Cheng
Yujun Shen
Qifeng Chen
DiffM
479
5
0
30 Dec 2024
PERSE: Personalized 3D Generative Avatars from A Single Portrait
Computer Vision and Pattern Recognition (CVPR), 2024
Hyunsoo Cha
Inhee Lee
Hanbyul Joo
3DGS
230
7
0
30 Dec 2024
Previous
1
2
3
...
13
14
15
...
18
19
20
Next