ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2311.15127
  4. Cited By
Stable Video Diffusion: Scaling Latent Video Diffusion Models to Large
  Datasets

Stable Video Diffusion: Scaling Latent Video Diffusion Models to Large Datasets

25 November 2023
A. Blattmann
Tim Dockhorn
Sumith Kulal
Daniel Mendelevitch
Maciej Kilian
Dominik Lorenz
Yam Levi
Zion English
Vikram S. Voleti
Adam Letts
Varun Jampani
Robin Rombach
    VGen
ArXiv (abs)PDFHTMLHuggingFace (13 upvotes)Github (25943★)

Papers citing "Stable Video Diffusion: Scaling Latent Video Diffusion Models to Large Datasets"

50 / 967 papers shown
Title
GraspDiffusion: Synthesizing Realistic Whole-body Hand-Object Interaction
GraspDiffusion: Synthesizing Realistic Whole-body Hand-Object Interaction
Patrick Kwon
Chen Chen
Hanbyul Joo
253
6
0
17 Oct 2024
DepthSplat: Connecting Gaussian Splatting and Depth
DepthSplat: Connecting Gaussian Splatting and DepthComputer Vision and Pattern Recognition (CVPR), 2024
Haofei Xu
Songyou Peng
Fangjinhua Wang
Hermann Blum
Dániel Baráth
Andreas Geiger
Marc Pollefeys
3DGSMDE
376
113
0
17 Oct 2024
Shaping a Stabilized Video by Mitigating Unintended Changes for Concept-Augmented Video Editing
Shaping a Stabilized Video by Mitigating Unintended Changes for Concept-Augmented Video EditingInternational Joint Conference on Artificial Intelligence (IJCAI), 2024
Mingce Guo
Jingxuan He
Shengeng Tang
Zhangye Wang
Lechao Cheng
VGenDiffM
280
2
0
16 Oct 2024
Your Mixture-of-Experts LLM Is Secretly an Embedding Model For Free
Your Mixture-of-Experts LLM Is Secretly an Embedding Model For FreeInternational Conference on Learning Representations (ICLR), 2024
Ziyue Li
Wanrong Zhu
MoE
440
39
0
14 Oct 2024
Tokenizing Motion: A Generative Approach for Scene Dynamics Compression
Tokenizing Motion: A Generative Approach for Scene Dynamics Compression
Shanzhi Yin
Zihan Zhang
Bolin Chen
Shiqi Wang
Yan Ye
VGen
183
3
0
13 Oct 2024
Semantic Score Distillation Sampling for Compositional Text-to-3D
  Generation
Semantic Score Distillation Sampling for Compositional Text-to-3D Generation
L. Yang
Zixiang Zhang
Junlin Han
Bohan Zeng
Runjia Li
Philip Torr
Wentao Zhang
296
6
0
11 Oct 2024
Distillation of Discrete Diffusion through Dimensional Correlations
Distillation of Discrete Diffusion through Dimensional Correlations
Satoshi Hayakawa
Yuhta Takida
Masaaki Imaizumi
Hiromi Wakaki
Yuki Mitsufuji
DiffM
503
14
0
11 Oct 2024
Hallo2: Long-Duration and High-Resolution Audio-Driven Portrait Image
  Animation
Hallo2: Long-Duration and High-Resolution Audio-Driven Portrait Image AnimationInternational Conference on Learning Representations (ICLR), 2024
Jiahao Cui
Hui Li
Yao Yao
Hao Zhu
Hanlin Shang
Kaihui Cheng
Hang Zhou
Siyu Zhu
Jingdong Wang
DiffMVGen
286
73
0
10 Oct 2024
Koala-36M: A Large-scale Video Dataset Improving Consistency between Fine-grained Conditions and Video Content
Koala-36M: A Large-scale Video Dataset Improving Consistency between Fine-grained Conditions and Video ContentComputer Vision and Pattern Recognition (CVPR), 2024
Qiuheng Wang
Yukai Shi
Jiarong Ou
Ruoxin Chen
Ke Lin
...
Mingwu Zheng
Xin Tao
Fei Yang
Pengfei Wan
Di Zhang
VGen
381
81
0
10 Oct 2024
Towards Synergistic, Generalized, and Efficient Dual-System for Robotic Manipulation
Towards Synergistic, Generalized, and Efficient Dual-System for Robotic Manipulation
Qingwen Bu
Hongyang Li
Li Chen
Jisong Cai
Jia Zeng
Heming Cui
Maoqing Yao
Yu Qiao
363
36
0
10 Oct 2024
Progressive Autoregressive Video Diffusion Models
Progressive Autoregressive Video Diffusion Models
Desai Xie
Zhan Xu
Yicong Hong
Hao Tan
Difan Liu
Feng Liu
Arie E. Kaufman
Yang Zhou
DiffMVGen
280
37
0
10 Oct 2024
AvatarGO: Zero-shot 4D Human-Object Interaction Generation and Animation
AvatarGO: Zero-shot 4D Human-Object Interaction Generation and Animation
Yukang Cao
Liang Pan
Kai Han
Kwan-Yee K. Wong
Ziwei Liu
VGen
320
18
0
09 Oct 2024
Pyramidal Flow Matching for Efficient Video Generative Modeling
Pyramidal Flow Matching for Efficient Video Generative ModelingInternational Conference on Learning Representations (ICLR), 2024
Yang Jin
Zhicheng Sun
Ningyuan Li
Kun Xu
K. Xu
...
Nan Zhuang
Quzhe Huang
Yang Song
Yadong Mu
Zhouchen Lin
VGen
433
184
0
08 Oct 2024
ViBiDSampler: Enhancing Video Interpolation Using Bidirectional Diffusion Sampler
ViBiDSampler: Enhancing Video Interpolation Using Bidirectional Diffusion SamplerInternational Conference on Learning Representations (ICLR), 2024
Serin Yang
Taesung Kwon
Jong Chul Ye
VGenDiffM
346
9
0
08 Oct 2024
Learning Efficient and Effective Trajectories for Differential Equation-based Image Restoration
Learning Efficient and Effective Trajectories for Differential Equation-based Image RestorationIEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2024
Zhiyu Zhu
Jinhui Hou
Hui Liu
H. Zeng
Junhui Hou
311
3
0
07 Oct 2024
Elucidating the Design Choice of Probability Paths in Flow Matching for Forecasting
Elucidating the Design Choice of Probability Paths in Flow Matching for Forecasting
Soon Hoe Lim
Yijin Wang
Annan Yu
Emma Hart
Michael W. Mahoney
Xiaoye S. Li
N. Benjamin Erichson
AI4TS
431
7
0
04 Oct 2024
IoT-LLM: a framework for enhancing Large Language Model reasoning from real-world sensor data
IoT-LLM: a framework for enhancing Large Language Model reasoning from real-world sensor data
Tuo An
Yunjiao Zhou
Han Zou
Jianfei Yang
LRM
342
20
0
03 Oct 2024
MetaMetrics: Calibrating Metrics For Generation Tasks Using Human Preferences
MetaMetrics: Calibrating Metrics For Generation Tasks Using Human PreferencesInternational Conference on Learning Representations (ICLR), 2024
Genta Indra Winata
David Anugraha
Lucky Susanto
Garry Kuwanto
Derry Wijaya
511
17
0
03 Oct 2024
Loong: Generating Minute-level Long Videos with Autoregressive Language Models
Loong: Generating Minute-level Long Videos with Autoregressive Language Models
Yuqing Wang
Tianwei Xiong
Daquan Zhou
Zhijie Lin
Yang Zhao
Bingyi Kang
Jiashi Feng
Xihui Liu
VGen
332
64
0
03 Oct 2024
Eliminating Oversaturation and Artifacts of High Guidance Scales in Diffusion Models
Eliminating Oversaturation and Artifacts of High Guidance Scales in Diffusion ModelsInternational Conference on Learning Representations (ICLR), 2024
Seyedmorteza Sadat
Otmar Hilliges
Romann M. Weber
DiffM
339
43
0
03 Oct 2024
Text2PDE: Latent Diffusion Models for Accessible Physics Simulation
Text2PDE: Latent Diffusion Models for Accessible Physics SimulationInternational Conference on Learning Representations (ICLR), 2024
Anthony Zhou
Zijie Li
Michael Schneier
John R Buchanan Jr
Amir Barati Farimani
AI4CEDiffM
390
12
0
02 Oct 2024
Scaling Offline Model-Based RL via Jointly-Optimized World-Action Model Pretraining
Scaling Offline Model-Based RL via Jointly-Optimized World-Action Model PretrainingInternational Conference on Learning Representations (ICLR), 2024
Jie Cheng
Ruixi Qiao
Gang Xiong
Binhua Li
Yingwei Ma
Binhua Li
Yongbin Li
Yisheng Lv
OffRLOnRLLM&Ro
289
7
0
01 Oct 2024
Stable Video Portraits
Stable Video PortraitsEuropean Conference on Computer Vision (ECCV), 2024
Mirela Ostrek
Justus Thies
VGenDiffM
268
3
0
26 Sep 2024
MIO: A Foundation Model on Multimodal Tokens
MIO: A Foundation Model on Multimodal Tokens
Zekun Wang
King Zhu
Chunpu Xu
Wangchunshu Zhou
Jiaheng Liu
...
Yuanxing Zhang
Ge Zhang
Ke Xu
Jie Fu
Wenhao Huang
MLLMAuLLM
406
20
0
26 Sep 2024
Generative Object Insertion in Gaussian Splatting with a Multi-View Diffusion Model
Generative Object Insertion in Gaussian Splatting with a Multi-View Diffusion ModelVisual Informatics (VI), 2024
Hongliang Zhong
Can Wang
Jingbo Zhang
Jing Liao
3DGSDiffM
279
5
0
25 Sep 2024
Mitigating Covariate Shift in Imitation Learning for Autonomous Vehicles Using Latent Space Generative World Models
Mitigating Covariate Shift in Imitation Learning for Autonomous Vehicles Using Latent Space Generative World Models
A. Popov
Alperen Degirmenci
David Wehr
Shashank Hegde
Ryan Oldja
...
David Nistér
Urs Muller
Ruchi Bhargava
Stan Birchfield
Nikolai Smolyanskiy
585
21
0
25 Sep 2024
MIMO: Controllable Character Video Synthesis with Spatial Decomposed Modeling
MIMO: Controllable Character Video Synthesis with Spatial Decomposed ModelingComputer Vision and Pattern Recognition (CVPR), 2024
Yifang Men
Yuan Yao
Miaomiao Cui
Liefeng Bo
DiffM
425
49
0
24 Sep 2024
Multi-modal Generative AI: Multi-modal LLMs, Diffusions, and the Unification
Multi-modal Generative AI: Multi-modal LLMs, Diffusions, and the Unification
X. Wang
Yuwei Zhou
Bin Huang
Hong Chen
Wenwu Zhu
DiffM
418
9
0
23 Sep 2024
Dormant: Defending against Pose-driven Human Image Animation
Dormant: Defending against Pose-driven Human Image Animation
Jiachen Zhou
Mingsi Wang
Tianlin Li
Guozhu Meng
Kai Chen
429
5
0
22 Sep 2024
OSV: One Step is Enough for High-Quality Image to Video Generation
OSV: One Step is Enough for High-Quality Image to Video GenerationComputer Vision and Pattern Recognition (CVPR), 2024
Xiaofeng Mao
Zhengkai Jiang
Fu-Yun Wang
Wenbing Zhu
Hao Chen
Mingmin Chi
Yabiao Wang
Wenhan Luo
DiffMVGen
358
22
0
17 Sep 2024
InteractPro: A Unified Framework for Motion-Aware Image Composition
InteractPro: A Unified Framework for Motion-Aware Image Composition
Weijing Tao
Xiaofeng Yang
Miaomiao Cui
Guosheng Lin
DiffM
297
2
0
16 Sep 2024
DriveScape: Towards High-Resolution Controllable Multi-View Driving
  Video Generation
DriveScape: Towards High-Resolution Controllable Multi-View Driving Video Generation
Wei Wu
Xi Guo
Weixuan Tang
Tingxuan Huang
Chiyu Wang
Dongyue Chen
C. Ding
VGen
172
13
0
09 Sep 2024
DreamForge: Motion-Aware Autoregressive Video Generation for Multi-View Driving Scenes
DreamForge: Motion-Aware Autoregressive Video Generation for Multi-View Driving Scenes
Jianbiao Mei
T. Hu
Xuemeng Yang
Licheng Wen
Yu Yang
Tiantian Wei
Yukai Ma
Min Dou
Botian Shi
Yong Liu
VGenDiffM
476
15
0
06 Sep 2024
Lexicon3D: Probing Visual Foundation Models for Complex 3D Scene Understanding
Lexicon3D: Probing Visual Foundation Models for Complex 3D Scene UnderstandingNeural Information Processing Systems (NeurIPS), 2024
Yunze Man
Shuhong Zheng
Zhipeng Bao
M. Hebert
Liang-Yan Gui
Yu-Xiong Wang
480
30
0
05 Sep 2024
Human-VDM: Learning Single-Image 3D Human Gaussian Splatting from Video Diffusion Models
Human-VDM: Learning Single-Image 3D Human Gaussian Splatting from Video Diffusion Models
Zhibin Liu
Haoye Dong
Aviral Chharia
Hefeng Wu
3DGSVGen
188
4
0
04 Sep 2024
CyberHost: Taming Audio-driven Avatar Diffusion Model with Region Codebook Attention
CyberHost: Taming Audio-driven Avatar Diffusion Model with Region Codebook Attention
Gaojie Lin
Jianwen Jiang
Chao Liang
Tianyun Zhong
Jiaqi Yang
Yanbo Zheng
VGenDiffM
498
32
0
03 Sep 2024
ReconX: Reconstruct Any Scene from Sparse Views with Video Diffusion Model
ReconX: Reconstruct Any Scene from Sparse Views with Video Diffusion Model
Fan Liu
Wenqiang Sun
Hanyang Wang
Yikai Wang
Haowen Sun
Junliang Ye
Jun Zhang
Yueqi Duan
VGen
454
93
0
29 Aug 2024
Generative Inbetweening: Adapting Image-to-Video Models for Keyframe Interpolation
Generative Inbetweening: Adapting Image-to-Video Models for Keyframe InterpolationInternational Conference on Learning Representations (ICLR), 2024
Xiaojuan Wang
Boyang Zhou
Brian L. Curless
Ira Kemelmacher-Shlizerman
Aleksander Holynski
Steven M. Seitz
DiffM
275
29
0
27 Aug 2024
Diffusion Models Are Real-Time Game Engines
Diffusion Models Are Real-Time Game EnginesInternational Conference on Learning Representations (ICLR), 2024
Dani Valevski
Yaniv Leviathan
Moab Arar
Shlomi Fruchter
DiffMVGenAI4CE
469
152
0
27 Aug 2024
Atlas Gaussians Diffusion for 3D Generation
Atlas Gaussians Diffusion for 3D GenerationInternational Conference on Learning Representations (ICLR), 2024
Haitao Yang
Yuan Dong
Hanwen Jiang
Dejia Xu
Georgios Pavlakos
Qixing Huang
3DGS
555
10
0
23 Aug 2024
Real-Time Video Generation with Pyramid Attention Broadcast
Real-Time Video Generation with Pyramid Attention BroadcastInternational Conference on Learning Representations (ICLR), 2024
Xuanlei Zhao
Xiaolong Jin
Kai Wang
Yang You
VGenDiffM
454
75
0
22 Aug 2024
DreamFactory: Pioneering Multi-Scene Long Video Generation with a
  Multi-Agent Framework
DreamFactory: Pioneering Multi-Scene Long Video Generation with a Multi-Agent Framework
Zhifei Xie
Daniel Tang
Dingwei Tan
Jacques Klein
Tegawend F. Bissyand
Saad Ezzini
VGen
220
23
0
21 Aug 2024
TrackGo: A Flexible and Efficient Method for Controllable Video Generation
TrackGo: A Flexible and Efficient Method for Controllable Video GenerationAAAI Conference on Artificial Intelligence (AAAI), 2024
Haitao Zhou
Chuang Wang
Rui Nie
Jinxiao Lin
Dongdong Yu
Qian Yu
Changhu Wang
VGenDiffM
490
26
0
21 Aug 2024
Kubrick: Multimodal Agent Collaborations for Synthetic Video Generation
Kubrick: Multimodal Agent Collaborations for Synthetic Video Generation
Liu He
Yizhi Song
Hejun Huang
Pinxin Liu
Yunlong Tang
Daniel G. Aliaga
Xin Zhou
DiffMVGen
390
9
0
19 Aug 2024
RealCustom++: Representing Images as Real Textual Word for Real-Time Customization
RealCustom++: Representing Images as Real Textual Word for Real-Time CustomizationIEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2024
Zhendong Mao
Mengqi Huang
Fei Ding
Mingcong Liu
Qian He
Xiaojun Chang
DiffM
446
14
0
19 Aug 2024
FancyVideo: Towards Dynamic and Consistent Video Generation via Cross-frame Textual Guidance
FancyVideo: Towards Dynamic and Consistent Video Generation via Cross-frame Textual GuidanceInternational Joint Conference on Artificial Intelligence (IJCAI), 2024
Jiasong Feng
Ao Ma
Jing Wang
Bo Cheng
Xiaodan Liang
DiffMVGen
372
10
0
15 Aug 2024
CogVideoX: Text-to-Video Diffusion Models with An Expert Transformer
CogVideoX: Text-to-Video Diffusion Models with An Expert TransformerInternational Conference on Learning Representations (ICLR), 2024
Zhuoyi Yang
Jiayan Teng
Wendi Zheng
Ming Ding
Shiyu Huang
...
Weihan Wang
Yean Cheng
Xiaotao Gu
Yuxiao Dong
Jie Tang
DiffMVGen
799
1,221
0
12 Aug 2024
Puppet-Master: Scaling Interactive Video Generation as a Motion Prior for Part-Level Dynamics
Puppet-Master: Scaling Interactive Video Generation as a Motion Prior for Part-Level Dynamics
Ruining Li
Chuanxia Zheng
Christian Rupprecht
Andrea Vedaldi
DiffMVGen
311
20
0
08 Aug 2024
VidGen-1M: A Large-Scale Dataset for Text-to-video Generation
VidGen-1M: A Large-Scale Dataset for Text-to-video Generation
Zhiyu Tan
Xiaomeng Yang
Luozheng Qin
Hao Li
VGen
244
37
0
05 Aug 2024
SF3D: Stable Fast 3D Mesh Reconstruction with UV-unwrapping and
  Illumination Disentanglement
SF3D: Stable Fast 3D Mesh Reconstruction with UV-unwrapping and Illumination DisentanglementComputer Vision and Pattern Recognition (CVPR), 2024
Mark Boss
Zixuan Huang
Aaryaman Vasishta
Varun Jampani
3DGS
288
58
0
01 Aug 2024
Previous
123...1617181920
Next