Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2402.14797
Cited By
Snap Video: Scaled Spatiotemporal Transformers for Text-to-Video Synthesis
22 February 2024
Willi Menapace
Aliaksandr Siarohin
Ivan Skorokhodov
Ekaterina Deyneka
Tsai-Shien Chen
Anil Kag
Yuwei Fang
A. Stoliar
Elisa Ricci
Jian Ren
Sergey Tulyakov
VGen
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Snap Video: Scaled Spatiotemporal Transformers for Text-to-Video Synthesis"
48 / 48 papers shown
Title
DualReal: Adaptive Joint Training for Lossless Identity-Motion Fusion in Video Customization
Wenchuan Wang
Mengqi Huang
Yijing Tu
Zhendong Mao
VGen
61
0
0
04 May 2025
SynergyAmodal: Deocclude Anything with Text Control
Xinyang Li
Chengjie Yi
Jiawei Lai
Mingbao Lin
Yansong Qu
Shengchuan Zhang
Liujuan Cao
DiffM
73
0
0
28 Apr 2025
H3AE: High Compression, High Speed, and High Quality AutoEncoder for Video Diffusion Models
Yushu Wu
Yanyu Li
Ivan Skorokhodov
Anil Kag
Willi Menapace
Sharath Girish
Aliaksandr Siarohin
Yanzhi Wang
Sergey Tulyakov
DiffM
VGen
35
0
0
14 Apr 2025
RAGME: Retrieval Augmented Video Generation for Enhanced Motion Realism
E. Peruzzo
Dejia Xu
Xingqian Xu
Humphrey Shi
N. Sebe
DiffM
VGen
54
0
0
09 Apr 2025
FlowR: Flowing from Sparse to Dense 3D Reconstructions
Tobias Fischer
Samuel Rota Buló
Yung-Hsu Yang
Nikhil Varma Keetha
Lorenzo Porzi
Norman Muller
Katja Schwarz
Jonathon Luiten
Marc Pollefeys
Peter Kontschieder
3DGS
39
0
0
02 Apr 2025
Can Video Diffusion Model Reconstruct 4D Geometry?
Jinjie Mai
Wenxuan Zhu
Haozhe Liu
Bing Li
Cheng Zheng
Jürgen Schmidhuber
Bernard Ghanem
VGen
MDE
70
0
0
27 Mar 2025
Free4D: Tuning-free 4D Scene Generation with Spatial-Temporal Consistency
T. Liu
Z. Huang
Zhaoxi Chen
Guangcong Wang
Shoukang Hu
Liao Shen
Huiqiang Sun
Z. Cao
Wei Li
Z. Liu
VGen
3DGS
76
0
0
26 Mar 2025
FullDiT: Multi-Task Video Generative Foundation Model with Full Attention
Xuan Ju
Weicai Ye
Quande Liu
Qiulin Wang
Xintao Wang
Pengfei Wan
Di Zhang
Kun Gai
Qiang Xu
VGen
39
1
0
25 Mar 2025
AccVideo: Accelerating Video Diffusion Model with Synthetic Dataset
Haiyu Zhang
Xinyuan Chen
Yaohui Wang
Xihui Liu
Yunhong Wang
Yu Qiao
VGen
59
0
0
25 Mar 2025
Can Text-to-Video Generation help Video-Language Alignment?
Luca Zanella
Massimiliano Mancini
Willi Menapace
Sergey Tulyakov
Yiming Wang
Elisa Ricci
DiffM
VGen
55
0
0
24 Mar 2025
ScalingNoise: Scaling Inference-Time Search for Generating Infinite Videos
Haolin Yang
Feilong Tang
Ming Hu
Yulong Li
Junjie Guo
Yexin Liu
Zelin Peng
Junjun He
Zongyuan Ge
VGen
DiffM
94
0
0
20 Mar 2025
ReCamMaster: Camera-Controlled Generative Rendering from A Single Video
Jianhong Bai
Menghan Xia
Xiao Fu
Xintao Wang
Lianrui Mu
...
Zuozhu Liu
Haoji Hu
Xiang Bai
Pengfei Wan
Di Zhang
DiffM
VGen
43
3
0
14 Mar 2025
CameraCtrl II: Dynamic Scene Exploration via Camera-controlled Video Diffusion Models
Hao He
Ceyuan Yang
Shanchuan Lin
Yinghao Xu
Meng Wei
Liangke Gui
Qi Zhao
Gordon Wetzstein
Lu Jiang
Hongsheng Li
DiffM
VGen
95
5
0
13 Mar 2025
REGEN: Learning Compact Video Embedding with (Re-)Generative Decoder
Yitian Zhang
Long Mai
Aniruddha Mahapatra
David Bourgin
Yicong Hong
Jonah Casebeer
Feng Liu
Y. Fu
DiffM
VGen
43
0
0
11 Mar 2025
FuseChat-3.0: Preference Optimization Meets Heterogeneous Model Fusion
Ziyi Yang
Fanqi Wan
Longguang Zhong
Canbin Huang
Guosheng Liang
Xiaojun Quan
MoMe
87
1
0
06 Mar 2025
Towards Physical Understanding in Video Generation: A 3D Point Regularization Approach
Yunuo Chen
Junli Cao
Anil Kag
Vidit Goel
Sergei Korolev
Chenfanfu Jiang
Sergey Tulyakov
Jian Ren
DiffM
VGen
86
1
0
05 Feb 2025
HuViDPO:Enhancing Video Generation through Direct Preference Optimization for Human-Centric Alignment
Lifan Jiang
Boxi Wu
Jiahui Zhang
Xiaotong Guan
Shuang Chen
VGen
56
1
0
02 Feb 2025
Multi-subject Open-set Personalization in Video Generation
Tsai-Shien Chen
Aliaksandr Siarohin
Willi Menapace
Yuwei Fang
Kwot Sin Lee
Ivan Skorokhodov
Kfir Aberman
Jun-Yan Zhu
Ming Yang
Sergey Tulyakov
DiffM
VGen
65
7
0
10 Jan 2025
AKiRa: Augmentation Kit on Rays for optical video generation
Xi Wang
Robin Courant
Marc Christie
Vicky Kalogeiton
VGen
98
3
0
31 Dec 2024
Bridging Interpretability and Robustness Using LIME-Guided Model Refinement
Navid Nayyem
Abdullah Rakin
Longwei Wang
AAML
FAtt
54
1
0
25 Dec 2024
Wonderland: Navigating 3D Scenes from a Single Image
Hanwen Liang
Junli Cao
Vidit Goel
Guocheng Qian
Sergei Korolev
Demetri Terzopoulos
Konstantinos N. Plataniotis
Sergey Tulyakov
Jian Ren
VGen
125
11
0
16 Dec 2024
Remix-DiT: Mixing Diffusion Transformers for Multi-Expert Denoising
Gongfan Fang
Xinyin Ma
Xinchao Wang
DiffM
MoE
99
0
0
07 Dec 2024
4Real-Video: Learning Generalizable Photo-Realistic 4D Video Diffusion
Chaoyang Wang
Peiye Zhuang
Tuan Duc Ngo
Willi Menapace
Aliaksandr Siarohin
Michael Vasilkovsky
Ivan Skorokhodov
Sergey Tulyakov
Peter Wonka
Hsin-Ying Lee
DiffM
VGen
90
3
0
05 Dec 2024
CPA: Camera-pose-awareness Diffusion Transformer for Video Generation
Yuelei Wang
Jian Zhang
Pengtao Jiang
H. Zhang
Jinwei Chen
Bo Li
VGen
DiffM
105
2
0
02 Dec 2024
MatchDiffusion: Training-free Generation of Match-cuts
Alejandro Pardo
Fabio Pizzati
Tong Zhang
Alexander Pondaven
Philip H. S. Torr
Juan C. Pérez
Bernard Ghanem
DiffM
VGen
75
1
0
27 Nov 2024
CAT4D: Create Anything in 4D with Multi-View Video Diffusion Models
Rundi Wu
Ruiqi Gao
Ben Poole
Alex Trevithick
Changxi Zheng
Jonathan T. Barron
Aleksander Holyñski
VGen
76
17
0
27 Nov 2024
ReCapture: Generative Video Camera Controls for User-Provided Videos using Masked Video Fine-Tuning
David Junhao Zhang
Roni Paiss
Shiran Zada
Nikhil Karnad
David E. Jacobs
Yael Pritch
Inbar Mosseri
Mike Zheng Shou
Neal Wadhwa
Nataniel Ruiz
DiffM
VGen
66
14
0
07 Nov 2024
AsCAN: Asymmetric Convolution-Attention Networks for Efficient Recognition and Generation
Anil Kag
Huseyin Coskun
Jierun Chen
Junli Cao
Willi Menapace
Aliaksandr Siarohin
Sergey Tulyakov
Jian Ren
38
2
0
07 Nov 2024
Factorized-Dreamer: Training A High-Quality Video Generator with Limited and Low-Quality Data
Tao Yang
Yangming Shi
Yunwen Huang
Feng Chen
Yin Zheng
Lei Zhang
DiffM
VGen
59
0
0
19 Aug 2024
FancyVideo: Towards Dynamic and Consistent Video Generation via Cross-frame Textual Guidance
Jiasong Feng
Ao Ma
Jing Wang
Bo Cheng
Xiaodan Liang
Dawei Leng
Yuhui Yin
DiffM
VGen
32
6
0
15 Aug 2024
VD3D: Taming Large Video Diffusion Transformers for 3D Camera Control
Sherwin Bahmani
Ivan Skorokhodov
Aliaksandr Siarohin
Willi Menapace
Guocheng Qian
...
Chaoyang Wang
Jiaxu Zou
Andrea Tagliasacchi
David B. Lindell
Sergey Tulyakov
VGen
DiffM
67
41
0
17 Jul 2024
MiraData: A Large-Scale Video Dataset with Long Durations and Structured Captions
Xuan Ju
Yiming Gao
Zhaoyang Zhang
Ziyang Yuan
Xintao Wang
Ailing Zeng
Yu Xiong
Qiang Xu
Ying Shan
VGen
59
36
0
08 Jul 2024
VIMI: Grounding Video Generation through Multi-modal Instruction
Yuwei Fang
Willi Menapace
Aliaksandr Siarohin
Tsai-Shien Chen
Kuan-Chien Wang
Ivan Skorokhodov
Graham Neubig
Sergey Tulyakov
VGen
55
2
0
08 Jul 2024
Hierarchical Patch Diffusion Models for High-Resolution Video Generation
Ivan Skorokhodov
Willi Menapace
Aliaksandr Siarohin
Sergey Tulyakov
VGen
34
1
0
12 Jun 2024
4Real: Towards Photorealistic 4D Scene Generation via Video Diffusion Models
Heng Yu
Chaoyang Wang
Peiye Zhuang
Willi Menapace
Aliaksandr Siarohin
Junli Cao
László A. Jeni
Sergey Tulyakov
Hsin-Ying Lee
VGen
43
12
0
11 Jun 2024
BitsFusion: 1.99 bits Weight Quantization of Diffusion Model
Yang Sui
Yanyu Li
Anil Kag
Yerlan Idelbayev
Junli Cao
Ju Hu
Dhritiman Sagar
Bo Yuan
Sergey Tulyakov
Jian Ren
MQ
30
17
0
06 Jun 2024
SF-V: Single Forward Video Generation Model
Zhixing Zhang
Yanyu Li
Yushu Wu
Yanwu Xu
Anil Kag
...
Aliaksandr Siarohin
Junli Cao
Dimitris N. Metaxas
Sergey Tulyakov
Jian Ren
DiffM
VGen
23
1
0
06 Jun 2024
Ctrl-Adapter: An Efficient and Versatile Framework for Adapting Diverse Controls to Any Diffusion Model
Han Lin
Jaemin Cho
Abhaysinh Zala
Mohit Bansal
DiffM
VGen
58
20
0
15 Apr 2024
TC4D: Trajectory-Conditioned Text-to-4D Generation
Sherwin Bahmani
Xian Liu
Yifan Wang
Ivan Skorokhodov
Victor Rong
...
Jeong Joon Park
Sergey Tulyakov
Gordon Wetzstein
Andrea Tagliasacchi
David B. Lindell
94
34
0
26 Mar 2024
Sora: A Review on Background, Technology, Limitations, and Opportunities of Large Vision Models
Yixin Liu
Kai Zhang
Yuan Li
Zhiling Yan
Chujie Gao
...
Yue Huang
Hanchi Sun
Jianfeng Gao
Lifang He
Lichao Sun
VLM
VGen
EGVM
65
241
0
27 Feb 2024
VBench: Comprehensive Benchmark Suite for Video Generative Models
Ziqi Huang
Yinan He
Jiashuo Yu
Fan Zhang
Chenyang Si
...
Xinyuan Chen
Limin Wang
Dahua Lin
Yu Qiao
Ziwei Liu
VGen
59
341
0
29 Nov 2023
A Survey on Video Diffusion Models
Zhen Xing
Qijun Feng
Haoran Chen
Qi Dai
Hang-Rui Hu
Hang Xu
Zuxuan Wu
Yu-Gang Jiang
EGVM
VGen
50
112
0
16 Oct 2023
LEO: Generative Latent Image Animator for Human Video Synthesis
Yaohui Wang
Xin Ma
Xinyuan Chen
A. Dantcheva
Bo Dai
Yu Qiao
DiffM
54
29
0
06 May 2023
VideoFusion: Decomposed Diffusion Models for High-Quality Video Generation
Zhengxiong Luo
Dayou Chen
Yingya Zhang
Yan Huang
Liangsheng Wang
Yujun Shen
Deli Zhao
Jinren Zhou
Tien-Ping Tan
DiffM
VGen
132
215
0
15 Mar 2023
f-DM: A Multi-stage Diffusion Model via Progressive Signal Transformation
Jiatao Gu
Shuangfei Zhai
Yizhe Zhang
Miguel Angel Bautista
J. Susskind
DiffM
36
26
0
10 Oct 2022
CogVideo: Large-scale Pretraining for Text-to-Video Generation via Transformers
Wenyi Hong
Ming Ding
Wendi Zheng
Xinghan Liu
Jie Tang
DiffM
235
556
0
29 May 2022
VideoGPT: Video Generation using VQ-VAE and Transformers
Wilson Yan
Yunzhi Zhang
Pieter Abbeel
A. Srinivas
ViT
VGen
237
482
0
20 Apr 2021
U-Net: Convolutional Networks for Biomedical Image Segmentation
Olaf Ronneberger
Philipp Fischer
Thomas Brox
SSeg
3DV
229
74,467
0
18 May 2015
1