ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2311.15127
  4. Cited By
Stable Video Diffusion: Scaling Latent Video Diffusion Models to Large
  Datasets

Stable Video Diffusion: Scaling Latent Video Diffusion Models to Large Datasets

25 November 2023
A. Blattmann
Tim Dockhorn
Sumith Kulal
Daniel Mendelevitch
Maciej Kilian
Dominik Lorenz
Yam Levi
Zion English
Vikram S. Voleti
Adam Letts
Varun Jampani
Robin Rombach
    VGen
ArXiv (abs)PDFHTMLHuggingFace (13 upvotes)Github (25943★)

Papers citing "Stable Video Diffusion: Scaling Latent Video Diffusion Models to Large Datasets"

50 / 967 papers shown
Title
Frame In-N-Out: Unbounded Controllable Image-to-Video Generation
Frame In-N-Out: Unbounded Controllable Image-to-Video Generation
Boyang Wang
Xuweiyi Chen
Matheus Gadelha
Zezhou Cheng
DiffMVGen
308
4
0
27 May 2025
Frame-Level Captions for Long Video Generation with Complex Multi Scenes
Frame-Level Captions for Long Video Generation with Complex Multi Scenes
Guangcong Zheng
Jianlong Yuan
Bo Wang
Haoyang Huang
Guoqing Ma
Nan Duan
DiffMVGen
262
0
0
27 May 2025
Advancing high-fidelity 3D and Texture Generation with 2.5D latents
Advancing high-fidelity 3D and Texture Generation with 2.5D latents
Xin Yang
Jiantao Lin
Yingjie Xu
Haodong Li
Yingcong Chen
3DV
242
3
0
27 May 2025
RainFusion: Adaptive Video Generation Acceleration via Multi-Dimensional Visual Redundancy
RainFusion: Adaptive Video Generation Acceleration via Multi-Dimensional Visual Redundancy
Aiyue Chen
Bin Dong
Jingru Li
Aiyue Chen
Kun Tian
Jing Lin
Gongyi Wang
VGen
279
2
0
27 May 2025
Long-Context State-Space Video World Models
Long-Context State-Space Video World Models
Ryan Po
Yotam Nitzan
Richard Zhang
Berlin Chen
Tri Dao
Eli Shechtman
Gordon Wetzstein
Xun Huang
292
22
0
26 May 2025
Force Prompting: Video Generation Models Can Learn and Generalize Physics-based Control Signals
Force Prompting: Video Generation Models Can Learn and Generalize Physics-based Control Signals
Nate Gillman
Charles Herrmann
Michael Freeman
Daksh Aggarwal
Evan Luo
Deqing Sun
Chen Sun
VGenAI4CE
403
10
0
26 May 2025
Dynamic-I2V: Exploring Image-to-Video Generation Models via Multimodal LLM
Dynamic-I2V: Exploring Image-to-Video Generation Models via Multimodal LLM
Peng Liu
Xiaoming Ren
Fengkai Liu
Qingsong Xie
Quanlong Zheng
Yanhao Zhang
Haonan Lu
Yujiu Yang
EGVMVGen
332
3
0
26 May 2025
MotionPro: A Precise Motion Controller for Image-to-Video Generation
MotionPro: A Precise Motion Controller for Image-to-Video GenerationComputer Vision and Pattern Recognition (CVPR), 2025
Zhongwei Zhang
Fuchen Long
Zhaofan Qiu
Yingwei Pan
Wu Liu
Ting Yao
Tao Mei
DiffMVGen
350
10
0
26 May 2025
HunyuanVideo-Avatar: High-Fidelity Audio-Driven Human Animation for Multiple Characters
HunyuanVideo-Avatar: High-Fidelity Audio-Driven Human Animation for Multiple Characters
Yi Chen
Sen Liang
Zixiang Zhou
Ziyao Huang
Yifeng Ma
Junshu Tang
Qin Lin
Yuan Zhou
Qinglin Lu
VGen
250
24
0
26 May 2025
AniCrafter: Customizing Realistic Human-Centric Animation via Avatar-Background Conditioning in Video Diffusion Models
AniCrafter: Customizing Realistic Human-Centric Animation via Avatar-Background Conditioning in Video Diffusion Models
Muyao Niu
Mingdeng Cao
Yifan Zhan
Qingtian Zhu
Mingze Ma
Jiancheng Zhao
Yanhong Zeng
Zhihang Zhong
Xiao Sun
Yinqiang Zheng
DiffMVGen
262
5
0
26 May 2025
Adaptive Diffusion Guidance via Stochastic Optimal Control
Adaptive Diffusion Guidance via Stochastic Optimal Control
Iskander Azangulov
Peter Potaptchik
Qinyu Li
Eddie Aamari
George Deligiannidis
Judith Rousseau
168
1
0
25 May 2025
ProphetDWM: A Driving World Model for Rolling Out Future Actions and Videos
ProphetDWM: A Driving World Model for Rolling Out Future Actions and Videos
Xiaodong Wang
Peixi Peng
VGen
1.3K
1
0
24 May 2025
Smoothie: Smoothing Diffusion on Token Embeddings for Text Generation
Smoothie: Smoothing Diffusion on Token Embeddings for Text Generation
Alexander Shabalin
Viacheslav Meshchaninov
Dmitry Vetrov
162
2
0
24 May 2025
DVD-Quant: Data-free Video Diffusion Transformers Quantization
DVD-Quant: Data-free Video Diffusion Transformers Quantization
Zhiteng Li
Hanxuan Li
Junyi Wu
Kai Liu
Linghe Kong
Guihai Chen
Yulun Zhang
Yulun Zhang
Yunbo Wang
MQVGen
202
3
0
24 May 2025
OmniConsistency: Learning Style-Agnostic Consistency from Paired Stylization Data
OmniConsistency: Learning Style-Agnostic Consistency from Paired Stylization Data
Yiren Song
Cheng Liu
Mike Zheng Shou
DiffM
372
10
0
24 May 2025
SafeMVDrive: Multi-view Safety-Critical Driving Video Synthesis in the Real World Domain
Jiawei Zhou
Linye Lyu
Zhuotao Tian
Cheng Zhuo
Yu Li
VGen
160
3
0
23 May 2025
ConfRover: Simultaneous Modeling of Protein Conformation and Dynamics via Autoregression
ConfRover: Simultaneous Modeling of Protein Conformation and Dynamics via Autoregression
Yuning Shen
Lihao Wang
Huizhuo Yuan
Yan Wang
B. Yang
Quanquan Gu
DiffMAI4CE
536
1
0
23 May 2025
Diffusion Classifiers Understand Compositionality, but Conditions Apply
Diffusion Classifiers Understand Compositionality, but Conditions Apply
Yujin Jeong
Arnas Uselis
Seong Joon Oh
Anna Rohrbach
DiffMCoGe
1.2K
3
3
23 May 2025
DanceTogether! Identity-Preserving Multi-Person Interactive Video Generation
Junhao Chen
Mingjin Chen
Jianjin Xu
Xiang Li
Junting Dong
...
Hongxiang Li
Yuhang Yang
Hao Zhao
Xiaoxiao Long
Ruqi Huang
DiffMVGen
255
5
0
23 May 2025
FLEX: A Backbone for Diffusion-Based Modeling of Spatio-temporal Physical Systems
FLEX: A Backbone for Diffusion-Based Modeling of Spatio-temporal Physical Systems
N. Benjamin Erichson
Vinicius Mikuni
Dongwei Lyu
Yang Gao
Omri Azencot
Soon Hoe Lim
Michael W. Mahoney
AI4CE
1.0K
4
0
23 May 2025
WonderPlay: Dynamic 3D Scene Generation from a Single Image and Actions
Zizhang Li
Hong-Xing Yu
Wei Liu
Yin Yang
Charles Herrmann
Gordon Wetzstein
Jiajun Wu
VGen
190
9
0
23 May 2025
Action2Dialogue: Generating Character-Centric Narratives from Scene-Level Prompts
Action2Dialogue: Generating Character-Centric Narratives from Scene-Level Prompts
Taewon Kang
Ming C. Lin
DiffMVGen
341
1
0
22 May 2025
Conditional Panoramic Image Generation via Masked Autoregressive Modeling
Conditional Panoramic Image Generation via Masked Autoregressive Modeling
Chaoyang Wang
Xiangtai Li
Lu Qi
X. Lin
Jinbin Bai
Qianyu Zhou
Yunhai Tong
DiffM
296
3
0
22 May 2025
Only Large Weights (And Not Skip Connections) Can Prevent the Perils of Rank Collapse
Only Large Weights (And Not Skip Connections) Can Prevent the Perils of Rank Collapse
Josh Alman
Zhao Song
323
10
0
22 May 2025
DOVE: Efficient One-Step Diffusion Model for Real-World Video Super-Resolution
DOVE: Efficient One-Step Diffusion Model for Real-World Video Super-Resolution
Zheng Chen
Zichen Zou
Kewei Zhang
Xiongfei Su
Xin Yuan
Yong Guo
Yulun Zhang
DiffMVGen
370
7
0
22 May 2025
M2SVid: End-to-End Inpainting and Refinement for Monocular-to-Stereo Video Conversion
M2SVid: End-to-End Inpainting and Refinement for Monocular-to-Stereo Video Conversion
Nina Shvetsova
Goutam Bhat
Prune Truong
Hilde Kuehne
Federico Tombari
DiffMVGenMDE
275
3
0
22 May 2025
Bigger Isn't Always Memorizing: Early Stopping Overparameterized Diffusion Models
Bigger Isn't Always Memorizing: Early Stopping Overparameterized Diffusion Models
Alessandro Favero
Antonio Sclocchi
Matthieu Wyart
DiffM
293
9
0
22 May 2025
Intentional Gesture: Deliver Your Intentions with Gestures for Speech
Intentional Gesture: Deliver Your Intentions with Gestures for Speech
Pinxin Liu
Haiyang Liu
Luchuan Song
Chenliang Xu
Chenliang Xu
SLR
298
7
0
21 May 2025
Programmatic Video Prediction Using Large Language Models
Programmatic Video Prediction Using Large Language Models
Hao Tang
Kevin Ellis
Suhas Lohit
Michael J. Jones
Moitreya Chatterjee
VGen
291
0
0
20 May 2025
Grouping First, Attending Smartly: Training-Free Acceleration for Diffusion Transformers
Grouping First, Attending Smartly: Training-Free Acceleration for Diffusion Transformers
Sucheng Ren
Qihang Yu
Ju He
Yaoyao Liu
Liang-Chieh Chen
350
6
0
20 May 2025
Learning to Integrate Diffusion ODEs by Averaging the Derivatives
Learning to Integrate Diffusion ODEs by Averaging the Derivatives
Wenze Liu
Xiangyu Yue
385
4
0
20 May 2025
Vid2World: Crafting Video Diffusion Models to Interactive World Models
Vid2World: Crafting Video Diffusion Models to Interactive World Models
Siqiao Huang
Jialong Wu
Qixing Zhou
Shangchen Miao
Mingsheng Long
VGen
277
11
0
20 May 2025
Safe-Sora: Safe Text-to-Video Generation via Graphical Watermarking
Safe-Sora: Safe Text-to-Video Generation via Graphical Watermarking
Zihan Su
Xuerui Qiu
Hongbin Xu
Tangyu Jiang
Junhao Zhuang
Chun Yuan
Ming Li
Shengfeng He
Fei Richard Yu
WIGM
348
1
0
19 May 2025
RoPECraft: Training-Free Motion Transfer with Trajectory-Guided RoPE Optimization on Diffusion Transformers
RoPECraft: Training-Free Motion Transfer with Trajectory-Guided RoPE Optimization on Diffusion Transformers
Ahmet Berke Gokmen
Yigit Ekin
Bahri Batuhan Bilecen
Aysegül Dündar
689
2
0
19 May 2025
Video-GPT via Next Clip Diffusion
Video-GPT via Next Clip Diffusion
Shaobin Zhuang
Zhipeng Huang
Ying Zhang
Fangyikang Wang
Canmiao Fu
Binxin Yang
Chong Sun
Chen Li
Yali Wang
DiffMVGen
559
5
0
18 May 2025
Fast RoPE Attention: Combining the Polynomial Method and Fast Fourier Transform
Fast RoPE Attention: Combining the Polynomial Method and Fast Fourier Transform
Josh Alman
Zhao Song
308
23
0
17 May 2025
DiCo: Revitalizing ConvNets for Scalable and Efficient Diffusion Modeling
DiCo: Revitalizing ConvNets for Scalable and Efficient Diffusion Modeling
Yuang Ai
Qihang Fan
Xuefeng Hu
Zhenheng Yang
Xiao-Yu Zhang
Huaibo Huang
DiffM
326
1
0
16 May 2025
PSDiffusion: Harmonized Multi-Layer Image Generation via Layout and Appearance Alignment
PSDiffusion: Harmonized Multi-Layer Image Generation via Layout and Appearance Alignment
Dingbang Huang
Wenbo Li
Yifei Zhao
Xinyu Pan
Yanhong Zeng
Bo Dai
Bo Dai
DiffM
263
7
0
16 May 2025
SpikeVideoFormer: An Efficient Spike-Driven Video Transformer with Hamming Attention and $\mathcal{O}(T)$ Complexity
SpikeVideoFormer: An Efficient Spike-Driven Video Transformer with Hamming Attention and O(T)\mathcal{O}(T)O(T) Complexity
Shihao Zou
Qingfeng Li
Wei Ji
Jingjing Li
Yongkui Yang
Guoqi Li
Chao Dong
314
1
0
15 May 2025
EWMBench: Evaluating Scene, Motion, and Semantic Quality in Embodied World Models
EWMBench: Evaluating Scene, Motion, and Semantic Quality in Embodied World Models
Hu Yue
Siyuan Huang
Yue Liao
Shengcong Chen
Pengfei Zhou
Liliang Chen
Maoqing Yao
Maoqing Yao
VGen
288
8
0
14 May 2025
Step1X-3D: Towards High-Fidelity and Controllable Generation of Textured 3D Assets
Step1X-3D: Towards High-Fidelity and Controllable Generation of Textured 3D Assets
Weiyu Li
Xiao-Yong Zhang
Zheng Sun
Di Qi
Haoyang Li
...
Zeming Li
Gang Yu
Xiangyu Zhang
Daxin Jiang
Ping Tan
342
29
0
12 May 2025
Generative Pre-trained Autoregressive Diffusion Transformer
Generative Pre-trained Autoregressive Diffusion Transformer
Yuan Zhang
Jiacheng Jiang
Guoqing Ma
Zhiying Lu
Haoyang Huang
Jianlong Yuan
Nan Duan
Daxin Jiang
VGen
546
6
0
12 May 2025
DAPE: Dual-Stage Parameter-Efficient Fine-Tuning for Consistent Video Editing with Diffusion Models
DAPE: Dual-Stage Parameter-Efficient Fine-Tuning for Consistent Video Editing with Diffusion Models
Junhao Xia
Chaoyang Zhang
Yecheng Zhang
Chengyang Zhou
Zhichang Wang
Bochun Liu
Dongshuo Yin
DiffMVGen
277
0
0
11 May 2025
BridgeIV: Bridging Customized Image and Video Generation through Test-Time Autoregressive Identity Propagation
BridgeIV: Bridging Customized Image and Video Generation through Test-Time Autoregressive Identity Propagation
Panwen Hu
Jiehui Huang
Qiang Sun
Xiaodan Liang
DiffMVGen
203
0
0
11 May 2025
ProFashion: Prototype-guided Fashion Video Generation with Multiple Reference Images
ProFashion: Prototype-guided Fashion Video Generation with Multiple Reference Images
Xianghao Kong
Qiaosong Qi
Yuanbin Wang
Anyi Rao
Biaolong Chen
Aixi Zhang
Si Liu
Hao Jiang
DiffMVGen
226
4
0
10 May 2025
The ML.ENERGY Benchmark: Toward Automated Inference Energy Measurement and Optimization
The ML.ENERGY Benchmark: Toward Automated Inference Energy Measurement and Optimization
Jae-Won Chung
Jiachen Liu
Jeff J. Ma
Jiachen Liu
Oh Jun Kweon
Yuxuan Xia
Zhiyu Wu
Mosharaf Chowdhury
569
8
0
09 May 2025
Demystifying Diffusion Policies: Action Memorization and Simple Lookup Table Alternatives
Demystifying Diffusion Policies: Action Memorization and Simple Lookup Table Alternatives
Chengyang He
Xu Liu
Gadiel Sznaier Camps
Guillaume Sartoretti
Mac Schwager
259
4
0
09 May 2025
ReverbMiipher: Generative Speech Restoration meets Reverberation Characteristics Controllability
ReverbMiipher: Generative Speech Restoration meets Reverberation Characteristics Controllability
Wataru Nakata
Yuma Koizumi
Shigeki Karita
Robin Scheibler
Haruko Ishikawa
Adriana Guevara-Rukoz
Heiga Zen
M. Bacchiani
317
2
0
08 May 2025
T2VTextBench: A Human Evaluation Benchmark for Textual Control in Video Generation Models
T2VTextBench: A Human Evaluation Benchmark for Textual Control in Video Generation Models
Xuyang Guo
Jiayan Huo
Zhenmei Shi
Zhao Song
Jiahao Zhang
Jiale Zhao
VGen
1.0K
8
0
08 May 2025
AgentSGEN: Multi-Agent LLM in the Loop for Semantic Collaboration and GENeration of Synthetic Data
AgentSGEN: Multi-Agent LLM in the Loop for Semantic Collaboration and GENeration of Synthetic Data
Vu Dinh Xuan
Hao Vo
David Murphy
Hoang D. Nguyen
SyDa
180
2
0
07 May 2025
Previous
123...8910...181920
Next