Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2405.03520
Cited By
Is Sora a World Simulator? A Comprehensive Survey on General World Models and Beyond
6 May 2024
Zheng Zhu
Xiaofeng Wang
Wangbo Zhao
Chen Min
Nianchen Deng
Min Dou
Yuqi Wang
Botian Shi
Kai Wang
Chi Zhang
Yang You
Zhaoxiang Zhang
Dawei Zhao
Liang Xiao
Jian-jun Zhao
Jiwen Lu
Guan Huang
VGen
LM&Ro
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Is Sora a World Simulator? A Comprehensive Survey on General World Models and Beyond"
24 / 24 papers shown
Title
From Sora What We Can See: A Survey of Text-to-Video Generation
Rui Sun
Yumin Zhang
Tejal Shah
Jiahao Sun
Shuoying Zhang
Wenqi Li
Haoran Duan
Bo Wei
R. Ranjan
EGVM
30
16
0
17 May 2024
StreamingT2V: Consistent, Dynamic, and Extendable Long Video Generation from Text
Roberto Henschel
Levon Khachatryan
Daniil Hayrapetyan
Hayk Poghosyan
Vahram Tadevosyan
Zhangyang Wang
Shant Navasardyan
Humphrey Shi
DiffM
VGen
38
17
0
21 Mar 2024
Generalized Predictive Model for Autonomous Driving
Jiazhi Yang
Shenyuan Gao
Yihang Qiu
Li Chen
Tianyu Li
...
Ping Luo
Jun Zhang
Andreas Geiger
Yu Qiao
Hongyang Li
VGen
20
9
0
14 Mar 2024
Panda-70M: Captioning 70M Videos with Multiple Cross-Modality Teachers
Tsai-Shien Chen
Aliaksandr Siarohin
Willi Menapace
Ekaterina Deyneka
Hsiang-wei Chao
...
Yuwei Fang
Hsin-Ying Lee
Jian Ren
Ming-Hsuan Yang
Sergey Tulyakov
VGen
32
49
0
29 Feb 2024
Revisiting Feature Prediction for Learning Visual Representations from Video
Adrien Bardes
Q. Garrido
Jean Ponce
Xinlei Chen
Michael G. Rabbat
Yann LeCun
Mahmoud Assran
Nicolas Ballas
MDE
VLM
31
16
0
15 Feb 2024
Panacea: Panoramic and Controllable Video Generation for Autonomous Driving
Yuqing Wen
Yucheng Zhao
Yingfei Liu
Fan Jia
Yanhui Wang
Chong Luo
Chi Zhang
Tiancai Wang
Xiaoyan Sun
Xiangyu Zhang
29
15
0
28 Nov 2023
Stable Video Diffusion: Scaling Latent Video Diffusion Models to Large Datasets
A. Blattmann
Tim Dockhorn
Sumith Kulal
Daniel Mendelevitch
Maciej Kilian
...
Zion English
Vikram S. Voleti
Adam Letts
Varun Jampani
Robin Rombach
VGen
124
379
0
25 Nov 2023
STORM: Efficient Stochastic Transformer based World Models for Reinforcement Learning
Weipu Zhang
Gang Wang
Jian-jun Sun
Yetian Yuan
Gao Huang
28
12
0
14 Oct 2023
DrivingDiffusion: Layout-Guided multi-view driving scene video generation with latent diffusion model
Xiaofan Li
Yifu Zhang
Xiaoqing Ye
VGen
26
29
0
11 Oct 2023
One Transformer Fits All Distributions in Multi-Modal Diffusion at Scale
Fan Bao
Shen Nie
Kaiwen Xue
Chongxuan Li
Shiliang Pu
Yaole Wang
Gang Yue
Yue Cao
Hang Su
Jun Zhu
DiffM
165
102
0
12 Mar 2023
TrafficBots: Towards World Models for Autonomous Driving Simulation and Motion Prediction
Zhejun Zhang
Alexander Liniger
Dengxin Dai
F. I. F. Richard Yu
Luc Van Gool
33
20
0
07 Mar 2023
Model-Based Imitation Learning for Urban Driving
Anthony Hu
Gianluca Corrado
Nicolas Griffiths
Zak Murez
Corina Gurau
Hudson Yeo
Alex Kendall
R. Cipolla
Jamie Shotton
55
87
0
14 Oct 2022
CogVideo: Large-scale Pretraining for Text-to-Video Generation via Transformers
Wenyi Hong
Ming Ding
Wendi Zheng
Xinghan Liu
Jie Tang
DiffM
203
333
0
29 May 2022
Autoregressive Image Generation using Residual Quantization
Doyup Lee
Chiheon Kim
Saehoon Kim
Minsu Cho
Wook-Shin Han
VGen
130
159
0
03 Mar 2022
BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation
Junnan Li
Dongxu Li
Caiming Xiong
S. Hoi
MLLM
BDL
VLM
CLIP
359
2,713
0
28 Jan 2022
Masked Autoencoders Are Scalable Vision Learners
Kaiming He
Xinlei Chen
Saining Xie
Yanghao Li
Piotr Dollár
Ross B. Girshick
ViT
TPM
233
5,353
0
11 Nov 2021
High-Fidelity GAN Inversion for Image Attribute Editing
Tengfei Wang
Yong Zhang
Yanbo Fan
Jue Wang
Qifeng Chen
DiffM
154
211
0
14 Sep 2021
AA-RMVSNet: Adaptive Aggregation Recurrent Multi-view Stereo Network
Zizhuang Wei
Qingtian Zhu
Chen Min
Yisong Chen
Guoping Wang
3DV
43
113
0
09 Aug 2021
VideoGPT: Video Generation using VQ-VAE and Transformers
Wilson Yan
Yunzhi Zhang
Pieter Abbeel
A. Srinivas
ViT
VGen
205
342
0
20 Apr 2021
Is Space-Time Attention All You Need for Video Understanding?
Gedas Bertasius
Heng Wang
Lorenzo Torresani
ViT
253
1,486
0
09 Feb 2021
ACAV100M: Automatic Curation of Large-Scale Datasets for Audio-Visual Video Representation Learning
Sangho Lee
Jiwan Chung
Youngjae Yu
Gunhee Kim
Thomas Breuel
Gal Chechik
Yale Song
27
36
0
26 Jan 2021
A Style-Based Generator Architecture for Generative Adversarial Networks
Tero Karras
S. Laine
Timo Aila
243
8,946
0
12 Dec 2018
Densely Connected Convolutional Networks
Gao Huang
Zhuang Liu
L. V. D. van der Maaten
Kilian Q. Weinberger
PINN
3DV
212
9,849
0
25 Aug 2016
U-Net: Convolutional Networks for Biomedical Image Segmentation
Olaf Ronneberger
Philipp Fischer
Thomas Brox
SSeg
3DV
177
9,999
0
18 May 2015
1