ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2311.15127
  4. Cited By
Stable Video Diffusion: Scaling Latent Video Diffusion Models to Large
  Datasets

Stable Video Diffusion: Scaling Latent Video Diffusion Models to Large Datasets

25 November 2023
A. Blattmann
Tim Dockhorn
Sumith Kulal
Daniel Mendelevitch
Maciej Kilian
Dominik Lorenz
Yam Levi
Zion English
Vikram S. Voleti
Adam Letts
Varun Jampani
Robin Rombach
    VGen
ArXiv (abs)PDFHTMLHuggingFace (13 upvotes)Github (25943★)

Papers citing "Stable Video Diffusion: Scaling Latent Video Diffusion Models to Large Datasets"

50 / 967 papers shown
Title
Generative Video Propagation
Generative Video PropagationComputer Vision and Pattern Recognition (CVPR), 2024
Shaoteng Liu
Tianyu Wang
Jiadong Wang
Qing Liu
Zhifei Zhang
...
Rui Wang
Bei Yu
Zhe Lin
Seunggeun Kim
Jiaya Jia
DiffMVGen
235
16
0
27 Dec 2024
AdaEAGLE: Optimizing Speculative Decoding via Explicit Modeling of
  Adaptive Draft Structures
AdaEAGLE: Optimizing Speculative Decoding via Explicit Modeling of Adaptive Draft Structures
Situo Zhang
Hankun Wang
Da Ma
Zichen Zhu
Lu Chen
Kunyao Lan
Kai Yu
214
22
0
25 Dec 2024
DrivingGPT: Unifying Driving World Modeling and Planning with
  Multi-modal Autoregressive Transformers
DrivingGPT: Unifying Driving World Modeling and Planning with Multi-modal Autoregressive Transformers
Yuntao Chen
Yuqi Wang
Rundong Wang
925
39
0
24 Dec 2024
VidTwin: Video VAE with Decoupled Structure and Dynamics
VidTwin: Video VAE with Decoupled Structure and DynamicsComputer Vision and Pattern Recognition (CVPR), 2024
Yuchi Wang
Junliang Guo
Xinyi Xie
Tianyu He
Xu Sun
Li Zhao
DRLVGen
316
7
0
23 Dec 2024
Enhancing Long Video Generation Consistency without Tuning
Enhancing Long Video Generation Consistency without Tuning
Xingyao Li
Fengzhuo Zhang
Jiachun Pan
Yunlong Hou
Vincent Y. F. Tan
Zhuoran Yang
DiffMVGen
282
0
0
23 Dec 2024
Adapting Image-to-Video Diffusion Models for Large-Motion Frame Interpolation
Adapting Image-to-Video Diffusion Models for Large-Motion Frame Interpolation
Luoxu Jin
Hiroshi Watanabe
DiffMVGen
474
0
0
22 Dec 2024
Label-Efficient Data Augmentation with Video Diffusion Models for Guidewire Segmentation in Cardiac Fluoroscopy
Label-Efficient Data Augmentation with Video Diffusion Models for Guidewire Segmentation in Cardiac FluoroscopyAAAI Conference on Artificial Intelligence (AAAI), 2024
Shaoyan Pan
Yikang Liu
Lin Zhao
Eric Z. Chen
Xiao Chen
Terrence Chen
Shanhui Sun
VGenMedIm
380
1
0
20 Dec 2024
AniDoc: Animation Creation Made Easier
AniDoc: Animation Creation Made EasierComputer Vision and Pattern Recognition (CVPR), 2024
Yihao Meng
Hao Ouyang
Hanlin Wang
Qiuyu Wang
Wen Wang
Ka Leong Cheng
Zhiheng Liu
Yujun Shen
Huamin Qu
DiffMVGen
441
13
0
18 Dec 2024
Learning from Massive Human Videos for Universal Humanoid Pose Control
Learning from Massive Human Videos for Universal Humanoid Pose Control
Jiageng Mao
Siheng Zhao
Siqi Song
Tianheng Shi
Junjie Ye
Mingtong Zhang
Haoran Geng
Jitendra Malik
Vitor Campagnolo Guizilini
Yue Wang
295
21
0
18 Dec 2024
VideoDPO: Omni-Preference Alignment for Video Diffusion Generation
VideoDPO: Omni-Preference Alignment for Video Diffusion GenerationComputer Vision and Pattern Recognition (CVPR), 2024
Runtao Liu
Haoyu Wu
Zheng Ziqiang
Chen Wei
Yingqing He
Renjie Pi
Qifeng Chen
VGen
276
59
0
18 Dec 2024
FlexCache: Flexible Approximate Cache System for Video Diffusion
FlexCache: Flexible Approximate Cache System for Video Diffusion
Desen Sun
Henry Tian
Tim Lu
Sihang Liu
DiffM
472
3
0
18 Dec 2024
Move-in-2D: 2D-Conditioned Human Motion Generation
Move-in-2D: 2D-Conditioned Human Motion GenerationComputer Vision and Pattern Recognition (CVPR), 2024
Hsin-Ping Huang
Yang Zhou
Jui-Hsien Wang
Difan Liu
Feng Liu
Ming-Hsuan Yang
Zhan Xu
VGenDiffM
168
3
0
17 Dec 2024
CAP4D: Creating Animatable 4D Portrait Avatars with Morphable Multi-View
  Diffusion Models
CAP4D: Creating Animatable 4D Portrait Avatars with Morphable Multi-View Diffusion ModelsComputer Vision and Pattern Recognition (CVPR), 2024
Felix Taubner
Ruihang Zhang
Mathieu Tuli
David B. Lindell
298
19
0
16 Dec 2024
Generative Inbetweening through Frame-wise Conditions-Driven Video
  Generation
Generative Inbetweening through Frame-wise Conditions-Driven Video GenerationComputer Vision and Pattern Recognition (CVPR), 2024
Tianyi Zhu
Dongwei Ren
Qilong Wang
Xiaohe Wu
W. Zuo
VGen
246
7
0
16 Dec 2024
InterDyn: Controllable Interactive Dynamics with Video Diffusion Models
InterDyn: Controllable Interactive Dynamics with Video Diffusion ModelsComputer Vision and Pattern Recognition (CVPR), 2024
Rick Akkerman
Haiwen Feng
M. Black
Dimitrios Tzionas
Victoria Fernandez-Abrevaya
VGenAI4CE
550
5
0
16 Dec 2024
Wonderland: Navigating 3D Scenes from a Single Image
Wonderland: Navigating 3D Scenes from a Single ImageComputer Vision and Pattern Recognition (CVPR), 2024
Hanwen Liang
Junli Cao
Sergei Korolev
Guocheng Qian
Sergei Korolev
Demetri Terzopoulos
Konstantinos N. Plataniotis
Sergey Tulyakov
Jian Ren
VGen
400
49
0
16 Dec 2024
Can video generation replace cinematographers? Research on the cinematic language of generated video
Can video generation replace cinematographers? Research on the cinematic language of generated video
Xuelong Li
Kai WU
Siyi Yang
YiZhan Qu
Guohua. Zhang
...
Mingliang Xiong
Hao Deng
Qingwen Liu
Gang Li
Bin He
VGenDiffM
343
2
0
16 Dec 2024
VividFace: A Diffusion-Based Hybrid Framework for High-Fidelity Video
  Face Swapping
VividFace: A Diffusion-Based Hybrid Framework for High-Fidelity Video Face Swapping
Hao Shao
Shulun Wang
Yang Zhou
Guanglu Song
Dailan He
Shuo Qin
Zhuofan Zong
Bingqi Ma
Wenshu Fan
Jiaming Song
VGenDiffM
290
2
0
15 Dec 2024
GEM: A Generalizable Ego-Vision Multimodal World Model for Fine-Grained
  Ego-Motion, Object Dynamics, and Scene Composition Control
GEM: A Generalizable Ego-Vision Multimodal World Model for Fine-Grained Ego-Motion, Object Dynamics, and Scene Composition ControlComputer Vision and Pattern Recognition (CVPR), 2024
Mariam Hassan
Sebastian Stapf
Ahmad Rahimi
Pedro M B Rezende
Yasaman Haghighi
...
Mathieu Salzmann
Davide Scaramuzza
Marc Pollefeys
Paolo Favaro
Alexandre Alahi
VLMVGen
270
35
0
15 Dec 2024
GenLit: Reformulating Single-Image Relighting as Video Generation
GenLit: Reformulating Single-Image Relighting as Video Generation
Shrisha Bharadwaj
Haiwen Feng
Giorgio Becherini
Victoria Fernandez-Abrevaya
Michael J. Black
VGen
443
5
0
15 Dec 2024
SoftVQ-VAE: Efficient 1-Dimensional Continuous Tokenizer
SoftVQ-VAE: Efficient 1-Dimensional Continuous TokenizerComputer Vision and Pattern Recognition (CVPR), 2024
Zeyang Zhang
Zihan Wang
Xianrui Li
Xingwu Sun
Fangyi Chen
Jiang Liu
Jiadong Wang
Bhiksha Raj
Zicheng Liu
Emad Barsoum
VLM
544
28
0
14 Dec 2024
Video Diffusion Transformers are In-Context Learners
Video Diffusion Transformers are In-Context Learners
Zhengcong Fei
Di Qiu
Changqian Yu
Debang Li
Mingyuan Fan
VGenDiffM
795
7
0
14 Dec 2024
SnapGen-V: Generating a Five-Second Video within Five Seconds on a Mobile Device
SnapGen-V: Generating a Five-Second Video within Five Seconds on a Mobile DeviceComputer Vision and Pattern Recognition (CVPR), 2024
Yushu Wu
Zhixing Zhang
Yanyu Li
Yanwu Xu
Vidit Goel
...
Ju Hu
Dimitris N. Metaxas
Yanzhi Wang
Sergey Tulyakov
Jian Ren
VGenDiffM
341
15
0
13 Dec 2024
Owl-1: Omni World Model for Consistent Long Video Generation
Owl-1: Omni World Model for Consistent Long Video Generation
Yuanhui Huang
Wenzhao Zheng
Yuan Gao
Xin Tao
Pengfei Wan
Di Zhang
Jie Zhou
Jiwen Lu
VGenVLM
368
11
0
12 Dec 2024
T-SVG: Text-Driven Stereoscopic Video Generation
T-SVG: Text-Driven Stereoscopic Video Generation
Qiao Jin
Xiaodong Chen
Wu Liu
Tao Mei
Yongdong Zhang
DiffMVGen
251
4
0
12 Dec 2024
Mojito: Motion Trajectory and Intensity Control for Video Generation
Mojito: Motion Trajectory and Intensity Control for Video Generation
Xuehai He
Shuohang Wang
Jianwei Yang
Xiaoxia Wu
Longji Xu
Kuan-Chieh Wang
Z. Zhan
Olatunji Ruwase
Yelong Shen
Xinze Wang
VGen
610
4
0
12 Dec 2024
FreeScale: Unleashing the Resolution of Diffusion Models via Tuning-Free Scale Fusion
FreeScale: Unleashing the Resolution of Diffusion Models via Tuning-Free Scale Fusion
Haonan Qiu
Shiwei Zhang
Yujie Wei
Ruihang Chu
Hangjie Yuan
Xinyu Wang
Yujiao Shi
Ziwei Liu
327
17
0
12 Dec 2024
Efficient Diversity-Preserving Diffusion Alignment via Gradient-Informed GFlowNets
Efficient Diversity-Preserving Diffusion Alignment via Gradient-Informed GFlowNetsInternational Conference on Learning Representations (ICLR), 2024
Zhen Liu
Tim Z. Xiao
Weiyang Liu
Yoshua Bengio
Dinghuai Zhang
631
18
0
10 Dec 2024
MV-DUSt3R+: Single-Stage Scene Reconstruction from Sparse Views In 2
  Seconds
MV-DUSt3R+: Single-Stage Scene Reconstruction from Sparse Views In 2 SecondsComputer Vision and Pattern Recognition (CVPR), 2024
Z-H. Tang
Yuchen Fan
Dilin Wang
Hongyu Xu
Rakesh Ranjan
Alex Schwing
Zhicheng Yan
3DGSVGen3DV
224
75
0
09 Dec 2024
SafeWatch: An Efficient Safety-Policy Following Video Guardrail Model
  with Transparent Explanations
SafeWatch: An Efficient Safety-Policy Following Video Guardrail Model with Transparent Explanations
Zhiwen Chen
Francesco Pinto
Minzhou Pan
Bo Li
283
16
0
09 Dec 2024
Around the World in 80 Timesteps: A Generative Approach to Global Visual
  Geolocation
Around the World in 80 Timesteps: A Generative Approach to Global Visual GeolocationComputer Vision and Pattern Recognition (CVPR), 2024
Nicolas Dufour
David Picard
Vicky Kalogeiton
Loic Landrieu
199
13
0
09 Dec 2024
MuMu-LLaMA: Multi-modal Music Understanding and Generation via Large
  Language Models
MuMu-LLaMA: Multi-modal Music Understanding and Generation via Large Language Models
Shansong Liu
Atin Sakkeer Hussain
Qilong Wu
Chenshuo Sun
Ying Shan
AuLLM
223
12
0
09 Dec 2024
You See it, You Got it: Learning 3D Creation on Pose-Free Videos at Scale
You See it, You Got it: Learning 3D Creation on Pose-Free Videos at ScaleComputer Vision and Pattern Recognition (CVPR), 2024
Baorui Ma
Huachen Gao
Haoge Deng
Zhengxiong Luo
Tiejun Huang
Lulu Tang
Xinlong Wang
DiffMVGen
621
43
0
09 Dec 2024
MotionStone: Decoupled Motion Intensity Modulation with Diffusion
  Transformer for Image-to-Video Generation
MotionStone: Decoupled Motion Intensity Modulation with Diffusion Transformer for Image-to-Video GenerationComputer Vision and Pattern Recognition (CVPR), 2024
Shuwei Shi
Biao Gong
Xi Chen
Dandan Zheng
Shuai Tan
...
Jingwen He
Kecheng Zheng
Jingdong Chen
Ming-Hsuan Yang
Yinqiang Zheng
VGenDiffM
199
10
0
08 Dec 2024
Birth and Death of a Rose
Birth and Death of a RoseComputer Vision and Pattern Recognition (CVPR), 2024
Chen Geng
Yunzhi Zhang
Shangzhe Wu
Jiajun Wu
AI4CE
337
2
0
06 Dec 2024
Using Diffusion Priors for Video Amodal Segmentation
Using Diffusion Priors for Video Amodal SegmentationComputer Vision and Pattern Recognition (CVPR), 2024
Kaihua Chen
Deva Ramanan
Tarasha Khurana
DiffMVOSVGen
172
8
0
05 Dec 2024
DiCoDe: Diffusion-Compressed Deep Tokens for Autoregressive Video
  Generation with Language Models
DiCoDe: Diffusion-Compressed Deep Tokens for Autoregressive Video Generation with Language Models
Yizhuo Li
Yuying Ge
Yixiao Ge
Ping Luo
Mingyu Ding
DiffMVGen
280
1
0
05 Dec 2024
InfiniCube: Unbounded and Controllable Dynamic 3D Driving Scene Generation with World-Guided Video Models
InfiniCube: Unbounded and Controllable Dynamic 3D Driving Scene Generation with World-Guided Video Models
Yifan Lu
Xuanchi Ren
Jiawei Yang
Tianchang Shen
Zhangjie Wu
...
Yanjie Wang
Siheng Chen
Mike Chen
Sanja Fidler
Jiahui Huang
VGen
379
32
0
05 Dec 2024
FreeSim: Toward Free-viewpoint Camera Simulation in Driving Scenes
FreeSim: Toward Free-viewpoint Camera Simulation in Driving ScenesComputer Vision and Pattern Recognition (CVPR), 2024
Lue Fan
Hao Zhang
Qitai Wang
Hongsheng Li
Rundong Wang
VGen3DGS
165
20
0
04 Dec 2024
MV-Adapter: Multi-view Consistent Image Generation Made Easy
MV-Adapter: Multi-view Consistent Image Generation Made Easy
Zehuan Huang
Xu Tan
Haoran Wang
Ran Yi
Lizhuang Ma
Yan-Pei Cao
Lu Sheng
341
63
0
04 Dec 2024
Mimir: Improving Video Diffusion Models for Precise Text Understanding
Mimir: Improving Video Diffusion Models for Precise Text UnderstandingComputer Vision and Pattern Recognition (CVPR), 2024
Shuai Tan
Biao Gong
Yutong Feng
Kecheng Zheng
Dandan Zheng
Shuwei Shi
Yujun Shen
Jingdong Chen
Ming-Hsuan Yang
VGen
225
12
0
04 Dec 2024
Navigation World Models
Navigation World ModelsComputer Vision and Pattern Recognition (CVPR), 2024
Amir Bar
G. Zhou
Danny Tran
Trevor Darrell
Yann LeCun
VGenEgoV
470
123
0
04 Dec 2024
Realistic Surgical Simulation from Monocular Videos
Realistic Surgical Simulation from Monocular Videos
Kailing Wang
Chen-Ning Yang
Keyang Zhao
Yunbo Wang
Wei Shen
193
1
0
03 Dec 2024
World-consistent Video Diffusion with Explicit 3D Modeling
World-consistent Video Diffusion with Explicit 3D ModelingComputer Vision and Pattern Recognition (CVPR), 2024
Qihang Zhang
Shuangfei Zhai
Miguel Angel Bautista
Kevin Miao
Alexander Toshev
J. Susskind
Jiatao Gu
VGen
253
28
0
02 Dec 2024
InfinityDrive: Breaking Time Limits in Driving World Models
InfinityDrive: Breaking Time Limits in Driving World Models
Xi Guo
C. Ding
Haoxuan Dou
Xin Zhang
Weixuan Tang
Wei Wu
VGen
355
12
0
02 Dec 2024
CPA: Camera-pose-awareness Diffusion Transformer for Video Generation
CPA: Camera-pose-awareness Diffusion Transformer for Video Generation
Yuelei Wang
Jian Zhang
Pengtao Jiang
Hao Zhang
Jinwei Chen
Bo Li
VGenDiffM
279
9
0
02 Dec 2024
Driving View Synthesis on Free-form Trajectories with Generative Prior
Driving View Synthesis on Free-form Trajectories with Generative Prior
Zeyu Yang
Zijie Pan
Yuankun Yang
Xiatian Zhu
Guang Dai
VGen
402
2
0
02 Dec 2024
Schedule On the Fly: Diffusion Time Prediction for Faster and Better Image Generation
Schedule On the Fly: Diffusion Time Prediction for Faster and Better Image GenerationComputer Vision and Pattern Recognition (CVPR), 2024
Zilyu Ye
Zhiyang Chen
Tiancheng Li
Zemin Huang
Weijian Luo
Guo-Jun Qi
DiffM
506
17
0
02 Dec 2024
Playable Game Generation
Playable Game Generation
Mingyu Yang
Junyou Li
Zhongbin Fang
Sheng Chen
Yangbin Yu
Qiang Fu
Wei Yang
Deheng Ye
VGen
264
19
0
01 Dec 2024
DreamDance: Animating Human Images by Enriching 3D Geometry Cues from 2D Poses
Yatian Pang
Bin Zhu
Bin Lin
Mingzhe Zheng
Francis E. H. Tay
Ser-Nam Lim
Harry Yang
Li Yuan
VGen3DH
256
11
0
30 Nov 2024
Previous
123...141516...181920
Next