ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2311.15127
  4. Cited By
Stable Video Diffusion: Scaling Latent Video Diffusion Models to Large
  Datasets

Stable Video Diffusion: Scaling Latent Video Diffusion Models to Large Datasets

25 November 2023
A. Blattmann
Tim Dockhorn
Sumith Kulal
Daniel Mendelevitch
Maciej Kilian
Dominik Lorenz
Yam Levi
Zion English
Vikram S. Voleti
Adam Letts
Varun Jampani
Robin Rombach
    VGen
ArXiv (abs)PDFHTMLHuggingFace (13 upvotes)Github (25943★)

Papers citing "Stable Video Diffusion: Scaling Latent Video Diffusion Models to Large Datasets"

50 / 967 papers shown
Title
Human Action CLIPs: Detecting AI-generated Human Motion
Human Action CLIPs: Detecting AI-generated Human Motion
Matyáš Boháček
Hany Farid
411
4
0
30 Nov 2024
ReconDreamer: Crafting World Models for Driving Scene Reconstruction via
  Online Restoration
ReconDreamer: Crafting World Models for Driving Scene Reconstruction via Online RestorationComputer Vision and Pattern Recognition (CVPR), 2024
Chaojun Ni
Guosheng Zhao
Xiaofeng Wang
Zheng Hua Zhu
Wenkang Qin
...
Kun Zhan
Fu Liu
Xianpeng Lang
Xingang Wang
Wenjun Mei
VGen
780
47
0
29 Nov 2024
Deepfake Media Generation and Detection in the Generative AI Era: A
  Survey and Outlook
Deepfake Media Generation and Detection in the Generative AI Era: A Survey and Outlook
Florinel-Alin Croitoru
Andrei Iulian Hiji
Vlad Hondru
Nicolae-Cătălin Ristea
Paul Irofti
Marius Popescu
Cristian Rusu
Radu Tudor Ionescu
Fahad Shahbaz Khan
Mubarak Shah
377
15
0
29 Nov 2024
AerialGo: Walking-through City View Generation from Aerial Perspectives
Fuqiang Zhao
Yijing Guo
Siyuan Yang
Xi Chen
Luo Wang
Yongjian Luo
Yujiao Shi
Yujiao Shi
Jingyi Yu
281
1
0
29 Nov 2024
Fleximo: Towards Flexible Text-to-Human Motion Video Generation
Fleximo: Towards Flexible Text-to-Human Motion Video Generation
Yuhang Zhang
Yuan Zhou
Zeyu Liu
Yuxuan Cai
Qiuyue Wang
Aidong Men
Huan Yang
VGenDiffM
230
3
0
29 Nov 2024
Motion Modes: What Could Happen Next?Computer Vision and Pattern Recognition (CVPR), 2024
Karran Pandey
Matheus Gadelha
Yannick Hold-Geoffroy
Karan Singh
Niloy J. Mitra
Paul Guerrero
VGenDiffM
318
5
0
29 Nov 2024
Track Anything Behind Everything: Zero-Shot Amodal Video Object
  Segmentation
Track Anything Behind Everything: Zero-Shot Amodal Video Object Segmentation
Finlay G. C. Hudson
W. Smith
VOSVLM
313
0
0
28 Nov 2024
Open-Sora Plan: Open-Source Large Video Generation Model
Bin Lin
Yunyang Ge
Xinhua Cheng
Zongjian Li
Bin Zhu
...
Zhang Pan
Xing Zhou
Shaoling Dong
Yonghong Tian
Li-xin Yuan
VLMVGen
375
185
0
28 Nov 2024
SPAgent: Adaptive Task Decomposition and Model Selection for General
  Video Generation and Editing
SPAgent: Adaptive Task Decomposition and Model Selection for General Video Generation and Editing
Rong-Cheng Tu
Wenhao Sun
Zhao Jin
Jingyi Liao
Jiaxing Huang
Dacheng Tao
VGenDiffM
317
12
0
28 Nov 2024
OpenHumanVid: A Large-Scale High-Quality Dataset for Enhancing Human-Centric Video GenerationComputer Vision and Pattern Recognition (CVPR), 2024
Hui Li
Mingwang Xu
Yun Zhan
Shan Mu
Jiaye Li
...
Yukang Chen
Tan Chen
Mao Ye
Jingdong Wang
Siyu Zhu
VGen
492
37
0
28 Nov 2024
PCDreamer: Point Cloud Completion Through Multi-view Diffusion Priors
PCDreamer: Point Cloud Completion Through Multi-view Diffusion PriorsComputer Vision and Pattern Recognition (CVPR), 2024
Guangshun Wei
Qi Liu
Long Ma
Chen Wang
Yuanfeng Zhou
Changjian Li
1.1K
10
0
28 Nov 2024
Timestep Embedding Tells: It's Time to Cache for Video Diffusion Model
Timestep Embedding Tells: It's Time to Cache for Video Diffusion ModelComputer Vision and Pattern Recognition (CVPR), 2024
Feng Liu
Shiwei Zhang
Xiaofeng Wang
Yujie Wei
Haonan Qiu
Yuzhong Zhao
Yingya Zhang
Qixiang Ye
Fang Wan
VGenAI4TS
432
78
0
28 Nov 2024
Diffusion Self-Distillation for Zero-Shot Customized Image Generation
Diffusion Self-Distillation for Zero-Shot Customized Image GenerationComputer Vision and Pattern Recognition (CVPR), 2024
Shengqu Cai
Eric Ryan Chan
Yunzhi Zhang
Leonidas Guibas
Jiajun Wu
Gordon Wetzstein
277
34
0
27 Nov 2024
CAT4D: Create Anything in 4D with Multi-View Video Diffusion Models
CAT4D: Create Anything in 4D with Multi-View Video Diffusion ModelsComputer Vision and Pattern Recognition (CVPR), 2024
Rundi Wu
Ruiqi Gao
Ben Poole
Alex Trevithick
Changxi Zheng
Jonathan T. Barron
Aleksander Holyñski
VGen
378
100
0
27 Nov 2024
Spatiotemporal Skip Guidance for Enhanced Video Diffusion Sampling
Spatiotemporal Skip Guidance for Enhanced Video Diffusion SamplingComputer Vision and Pattern Recognition (CVPR), 2024
J. Hyung
Kinam Kim
Susung Hong
M. Kim
Jaegul Choo
VGen
266
14
0
27 Nov 2024
HiFiVFS: High Fidelity Video Face Swapping
HiFiVFS: High Fidelity Video Face Swapping
Xu Chen
Keke He
Junwei Zhu
Yanhao Ge
Wei Li
Chengjie Wang
VGenDiffM
320
3
0
27 Nov 2024
Individual Content and Motion Dynamics Preserved Pruning for Video Diffusion Models
Individual Content and Motion Dynamics Preserved Pruning for Video Diffusion Models
Yiming Wu
Zhenghao Chen
Huan Wang
Dong Xu
DiffMVGen
421
2
0
27 Nov 2024
WF-VAE: Enhancing Video VAE by Wavelet-Driven Energy Flow for Latent Video Diffusion Model
WF-VAE: Enhancing Video VAE by Wavelet-Driven Energy Flow for Latent Video Diffusion ModelComputer Vision and Pattern Recognition (CVPR), 2024
Zongjian Li
Bin Lin
Yang Ye
Liuhan Chen
Xinhua Cheng
Shenghai Yuan
Li-xin Yuan
VGenDiffM
466
30
0
26 Nov 2024
Generative Omnimatte: Learning to Decompose Video into Layers
Generative Omnimatte: Learning to Decompose Video into LayersComputer Vision and Pattern Recognition (CVPR), 2024
Yao-Chih Lee
Erika Lu
Sarah Rumbley
Michal Geyer
Jia-Bin Huang
Tali Dekel
Forrester Cole
DiffMVGen
410
13
0
25 Nov 2024
MVGenMaster: Scaling Multi-View Generation from Any Image via 3D Priors Enhanced Diffusion Model
MVGenMaster: Scaling Multi-View Generation from Any Image via 3D Priors Enhanced Diffusion ModelComputer Vision and Pattern Recognition (CVPR), 2024
Chenjie Cao
Chaohui Yu
Shang Liu
Fan Wang
Xiangyang Xue
Yanwei Fu
421
10
0
25 Nov 2024
Sonic: Shifting Focus to Global Audio Perception in Portrait Animation
Sonic: Shifting Focus to Global Audio Perception in Portrait AnimationComputer Vision and Pattern Recognition (CVPR), 2024
Xiaozhong Ji
Xiaobin Hu
Zhihong Xu
Junwei Zhu
Chuming Lin
...
Donghao Luo
Yi Chen
Qin Lin
Qinglin Lu
Chengjie Wang
VGen
373
43
0
25 Nov 2024
Ca2-VDM: Efficient Autoregressive Video Diffusion Model with Causal Generation and Cache Sharing
Ca2-VDM: Efficient Autoregressive Video Diffusion Model with Causal Generation and Cache Sharing
Kaifeng Gao
Jiaxin Shi
Hanwang Zhang
Chunping Wang
Jun Xiao
Long Chen
VGenDiffM
430
17
0
25 Nov 2024
Gaussian Scenes: Pose-Free Sparse-View Scene Reconstruction using Depth-Enhanced Diffusion Priors
Gaussian Scenes: Pose-Free Sparse-View Scene Reconstruction using Depth-Enhanced Diffusion Priors
Soumava Paul
Prakhar Kaushik
Yaoyao Liu
3DGSDiffM
1.0K
4
0
24 Nov 2024
MovieBench: A Hierarchical Movie Level Dataset for Long Video Generation
MovieBench: A Hierarchical Movie Level Dataset for Long Video GenerationComputer Vision and Pattern Recognition (CVPR), 2024
Weijia Wu
Mingyu Liu
Zeyu Zhu
Xi Xia
Haoen Feng
Wen Wang
Kevin Qinghong Lin
Chunhua Shen
Mike Zheng Shou
DiffMVGen
399
12
0
22 Nov 2024
PhysFlow: Unleashing the Potential of Multi-modal Foundation Models and Video Diffusion for 4D Dynamic Physical Scene Simulation
PhysFlow: Unleashing the Potential of Multi-modal Foundation Models and Video Diffusion for 4D Dynamic Physical Scene SimulationComputer Vision and Pattern Recognition (CVPR), 2024
Zhuoman Liu
Weicai Ye
Yan Luximon
Pengfei Wan
Di Zhang
VGenAI4CE
433
16
0
21 Nov 2024
Latent Knowledge-Guided Video Diffusion for Scientific Phenomena Generation from a Single Initial Frame
Qinglong Cao
Xirui Li
Ding Wang
Yuntian Chen
Chao Ma
Yunbo Wang
DiffMVGen
521
6
0
18 Nov 2024
SpatialDreamer: Self-supervised Stereo Video Synthesis from Monocular Input
SpatialDreamer: Self-supervised Stereo Video Synthesis from Monocular InputComputer Vision and Pattern Recognition (CVPR), 2024
Zhen Lv
Yangqi Long
Congzhentao Huang
Cao Li
Chengfei Lv
Hao Ren
Dian Zheng
DiffMVGenMDE
399
9
0
18 Nov 2024
FlipSketch: Flipping Static Drawings to Text-Guided Sketch AnimationsComputer Vision and Pattern Recognition (CVPR), 2024
Hmrishav Bandyopadhyay
Yi-Zhe Song
DiffMVGen
213
6
0
16 Nov 2024
OnlyFlow: Optical Flow based Motion Conditioning for Video Diffusion Models
OnlyFlow: Optical Flow based Motion Conditioning for Video Diffusion Models
Mathis Koroglu
Hugo Caselles-Dupré
Guillaume Jeanneret Sanmiguel
Matthieu Cord
VGenDiffM
300
7
0
15 Nov 2024
EchoMimicV2: Towards Striking, Simplified, and Semi-Body Human Animation
EchoMimicV2: Towards Striking, Simplified, and Semi-Body Human AnimationComputer Vision and Pattern Recognition (CVPR), 2024
Rang Meng
Xingyu Zhang
Yuming Li
Chenguang Ma
415
48
0
15 Nov 2024
I2VControl-Camera: Precise Video Camera Control with Adjustable Motion Strength
I2VControl-Camera: Precise Video Camera Control with Adjustable Motion StrengthInternational Conference on Learning Representations (ICLR), 2024
Wanquan Feng
Jiawei Liu
Pengqi Tu
Tianhao Qi
Mingzhen Sun
Tianxiang Ma
Mingcong Liu
Siyu Zhou
Qian He
VGen
482
22
0
10 Nov 2024
DimensionX: Create Any 3D and 4D Scenes from a Single Image with
  Controllable Video Diffusion
DimensionX: Create Any 3D and 4D Scenes from a Single Image with Controllable Video Diffusion
Wenqiang Sun
Shuo Chen
Fan Liu
Zilong Chen
Yueqi Duan
Jun Zhang
Yikai Wang
VGen
333
98
0
07 Nov 2024
StoryAgent: Customized Storytelling Video Generation via Multi-Agent
  Collaboration
StoryAgent: Customized Storytelling Video Generation via Multi-Agent Collaboration
Panwen Hu
Jin Jiang
Jianqi Chen
Mingfei Han
Shengcai Liao
Xiaojun Chang
Xiaodan Liang
VGenDiffM
333
18
0
07 Nov 2024
SG-I2V: Self-Guided Trajectory Control in Image-to-Video Generation
SG-I2V: Self-Guided Trajectory Control in Image-to-Video GenerationInternational Conference on Learning Representations (ICLR), 2024
Koichi Namekata
Sherwin Bahmani
Ziyi Wu
Yash Kant
Igor Gilitschenski
David B. Lindell
VGen
531
36
0
07 Nov 2024
TIP-I2V: A Million-Scale Real Text and Image Prompt Dataset for Image-to-Video Generation
TIP-I2V: A Million-Scale Real Text and Image Prompt Dataset for Image-to-Video Generation
Wenhao Wang
Yue Yang
VGen
276
5
0
05 Nov 2024
GenXD: Generating Any 3D and 4D Scenes
GenXD: Generating Any 3D and 4D ScenesInternational Conference on Learning Representations (ICLR), 2024
Yuyang Zhao
Chung-Ching Lin
Kevin Qinghong Lin
Zhiwen Yan
Linjie Li
Zhiyong Yang
Jianfeng Wang
G. Lee
Lijuan Wang
VGen
354
39
0
04 Nov 2024
On Learning Multi-Modal Forgery Representation for Diffusion Generated Video Detection
On Learning Multi-Modal Forgery Representation for Diffusion Generated Video DetectionNeural Information Processing Systems (NeurIPS), 2024
Xiufeng Song
Xiao Guo
Junxuan Zhang
Qirui Li
Lei Bai
Xiaoming Liu
Guangtao Zhai
Xiaohong Liu
VGenDiffM
592
28
0
31 Oct 2024
Video prediction using score-based conditional density estimation
Video prediction using score-based conditional density estimation
P. Fiquet
Eero P. Simoncelli
AI4TS
132
0
0
30 Oct 2024
ReferEverything: Towards Segmenting Everything We Can Speak of in Videos
ReferEverything: Towards Segmenting Everything We Can Speak of in Videos
Anurag Bagchi
Zhipeng Bao
Yu-Xiong Wang
P. Tokmakov
Martial Hebert
VOS
251
2
0
30 Oct 2024
One Prompt to Verify Your Models: Black-Box Text-to-Image Models Verification via Non-Transferable Adversarial Attacks
One Prompt to Verify Your Models: Black-Box Text-to-Image Models Verification via Non-Transferable Adversarial Attacks
Ji Guo
Wenbo Jiang
Rui Zhang
Guoming Lu
Hongwei Li
AAML
480
1
0
30 Oct 2024
ARLON: Boosting Diffusion Transformers with Autoregressive Models for Long Video Generation
ARLON: Boosting Diffusion Transformers with Autoregressive Models for Long Video GenerationInternational Conference on Learning Representations (ICLR), 2024
Zongyi Li
Shujie Hu
Shujie Liu
Long Zhou
Jeongsoo Choi
Lingwei Meng
Xun Guo
Jiajian Li
H. Ling
Furu Wei
VGenDiffM
348
25
0
27 Oct 2024
Framer: Interactive Frame Interpolation
Framer: Interactive Frame InterpolationInternational Conference on Learning Representations (ICLR), 2024
Wen Wang
Qiuyu Wang
Kecheng Zheng
Hao Ouyang
Zhekai Chen
Biao Gong
Hao Chen
Yujun Shen
Chunhua Shen
VGen
209
19
0
24 Oct 2024
Robust Watermarking Using Generative Priors Against Image Editing: From Benchmarking to Advances
Robust Watermarking Using Generative Priors Against Image Editing: From Benchmarking to AdvancesInternational Conference on Learning Representations (ICLR), 2024
Shilin Lu
Zihan Zhou
Jiayou Lu
Yuanzhi Zhu
A. Kong
WIGM
521
72
0
24 Oct 2024
FreeVS: Generative View Synthesis on Free Driving Trajectory
FreeVS: Generative View Synthesis on Free Driving TrajectoryInternational Conference on Learning Representations (ICLR), 2024
Qitai Wang
Lue Fan
Yuqi Wang
Yuntao Chen
Rundong Wang
VGen
216
29
0
23 Oct 2024
SINGAPO: Single Image Controlled Generation of Articulated Parts in Objects
SINGAPO: Single Image Controlled Generation of Articulated Parts in ObjectsInternational Conference on Learning Representations (ICLR), 2024
Jiayi Liu
Denys Iliash
Angel X. Chang
Manolis Savva
Ali Mahdavi-Amiri
429
34
0
21 Oct 2024
EVA: An Embodied World Model for Future Video Anticipation
EVA: An Embodied World Model for Future Video Anticipation
Yatian Wang
Hengyuan Zhang
Chun-Kai Fan
Xingqun Qi
Rongyu Zhang
...
Chi-Min Chan
Wei Xue
Wenhan Luo
Shanghang Zhang
Wenhan Luo
VGen
213
16
0
20 Oct 2024
FrameBridge: Improving Image-to-Video Generation with Bridge Models
FrameBridge: Improving Image-to-Video Generation with Bridge Models
Yuji Wang
Zehua Chen
Xiaoyu Chen
Jun-Jie Zhu
Jianfei Chen
Jianfei Chen
DiffMVGen
945
9
0
20 Oct 2024
Assessing Open-world Forgetting in Generative Image Model Customization
Assessing Open-world Forgetting in Generative Image Model Customization
Héctor Laria
Alex Gomez-Villa
Imad Eddine Marouf
Bogdan Raducanu
Bogdan Raducanu
VLMDiffM
264
0
0
18 Oct 2024
DreamVideo-2: Zero-Shot Subject-Driven Video Customization with Precise
  Motion Control
DreamVideo-2: Zero-Shot Subject-Driven Video Customization with Precise Motion Control
Yujie Wei
Shiwei Zhang
Hangjie Yuan
Xiang Wang
Haonan Qiu
...
Fan Liu
Zhizhong Huang
Jiaxin Ye
Yingya Zhang
Hongming Shan
DiffMVGen
287
29
0
17 Oct 2024
DriveDreamer4D: World Models Are Effective Data Machines for 4D Driving
  Scene Representation
DriveDreamer4D: World Models Are Effective Data Machines for 4D Driving Scene RepresentationComputer Vision and Pattern Recognition (CVPR), 2024
Guosheng Zhao
Chaojun Ni
Xiaofeng Wang
Zheng Zhu
Xinming Zhang
...
Xinze Chen
Boyuan Wang
Youyi Zhang
Wenjun Mei
Xingang Wang
VGen
447
75
0
17 Oct 2024
Previous
123...151617181920
Next