ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2402.17177
  4. Cited By
Sora: A Review on Background, Technology, Limitations, and Opportunities
  of Large Vision Models

Sora: A Review on Background, Technology, Limitations, and Opportunities of Large Vision Models

27 February 2024
Yixin Liu
Kai Zhang
Yuan Li
Zhiling Yan
Chujie Gao
Ruoxi Chen
Zhengqing Yuan
Yue Huang
Hanchi Sun
Jianfeng Gao
Lifang He
Lichao Sun
    VLM
    VGen
    EGVM
ArXivPDFHTML

Papers citing "Sora: A Review on Background, Technology, Limitations, and Opportunities of Large Vision Models"

50 / 53 papers shown
Title
VideoHallu: Evaluating and Mitigating Multi-modal Hallucinations for Synthetic Videos
VideoHallu: Evaluating and Mitigating Multi-modal Hallucinations for Synthetic Videos
Zongxia Li
Xiyang Wu
Yubin Qin
Guangyao Shi
Hongyang Du
Dinesh Manocha
Tianyi Zhou
Jordan Boyd-Graber
MLLM
41
0
0
02 May 2025
A Rusty Link in the AI Supply Chain: Detecting Evil Configurations in Model Repositories
A Rusty Link in the AI Supply Chain: Detecting Evil Configurations in Model Repositories
Ziqi Ding
Qian Fu
Junchen Ding
Gelei Deng
Yi Liu
Yuekang Li
25
0
0
02 May 2025
Simple Visual Artifact Detection in Sora-Generated Videos
Simple Visual Artifact Detection in Sora-Generated Videos
Misora Sugiyama
Hirokatsu Kataoka
EGVM
45
0
0
30 Apr 2025
EDEN: Enhanced Diffusion for High-quality Large-motion Video Frame Interpolation
EDEN: Enhanced Diffusion for High-quality Large-motion Video Frame Interpolation
Zihao Zhang
Haoran Chen
Haoyu Zhao
Guansong Lu
Yanwei Fu
Hang Xu
Zuxuan Wu
VGen
DiffM
62
0
0
20 Mar 2025
V2Edit: Versatile Video Diffusion Editor for Videos and 3D Scenes
V2Edit: Versatile Video Diffusion Editor for Videos and 3D Scenes
Yanming Zhang
Jun-Kun Chen
Jipeng Lyu
Yu-Xiong Wang
DiffM
VGen
44
0
0
13 Mar 2025
DexGrasp Anything: Towards Universal Robotic Dexterous Grasping with Physics Awareness
DexGrasp Anything: Towards Universal Robotic Dexterous Grasping with Physics Awareness
Yiming Zhong
Qi Jiang
Jingyi Yu
Yuexin Ma
46
2
0
11 Mar 2025
Post-Training Quantization for Diffusion Transformer via Hierarchical Timestep Grouping
Post-Training Quantization for Diffusion Transformer via Hierarchical Timestep Grouping
Ning Ding
Jing Han
Yuchuan Tian
Chao Xu
Kai Han
Yehui Tang
MQ
35
0
0
10 Mar 2025
Get In Video: Add Anything You Want to the Video
Shaobin Zhuang
Zhipeng Huang
Binxin Yang
Ying Zhang
Fangyikang Wang
Canmiao Fu
Chong Sun
Zheng-Jun Zha
Chen Li
Y. Wang
DiffM
VGen
38
0
0
08 Mar 2025
BRIDGE: Bootstrapping Text to Control Time-Series Generation via Multi-Agent Iterative Optimization and Diffusion Modelling
BRIDGE: Bootstrapping Text to Control Time-Series Generation via Multi-Agent Iterative Optimization and Diffusion Modelling
Hao Li
Yu Huang
Chang Xu
Viktor Schlegel
Ren-He Jiang
R. Batista-Navarro
Goran Nenadic
Jiang Bian
DiffM
AI4CE
71
3
0
04 Mar 2025
DiffGuard: Text-Based Safety Checker for Diffusion Models
DiffGuard: Text-Based Safety Checker for Diffusion Models
Massine El Khader
Elias Al Bouzidi
Abdellah Oumida
Mohammed Sbaihi
Eliott Binard
Jean-Philippe Poli
Wassila Ouerdane
Boussad Addad
Katarzyna Kapusta
DiffM
97
0
0
20 Feb 2025
LLMPopcorn: An Empirical Study of LLMs as Assistants for Popular Micro-video Generation
Junchen Fu
Xuri Ge
Kaiwen Zheng
Ioannis Arapakis
Xin Xin
J. Jose
65
0
0
20 Feb 2025
A Physical Coherence Benchmark for Evaluating Video Generation Models via Optical Flow-guided Frame Prediction
A Physical Coherence Benchmark for Evaluating Video Generation Models via Optical Flow-guided Frame Prediction
Yongfan Chen
Xiuwen Zhu
Tianyu Li
EGVM
VGen
40
3
0
08 Feb 2025
FairT2I: Mitigating Social Bias in Text-to-Image Generation via Large Language Model-Assisted Detection and Attribute Rebalancing
FairT2I: Mitigating Social Bias in Text-to-Image Generation via Large Language Model-Assisted Detection and Attribute Rebalancing
Jinya Sakurai
Issei Sato
60
0
0
06 Feb 2025
Accelerate High-Quality Diffusion Models with Inner Loop Feedback
Accelerate High-Quality Diffusion Models with Inner Loop Feedback
M. Gwilliam
Han Cai
Di Wu
Abhinav Shrivastava
Zhiyu Cheng
79
0
0
22 Jan 2025
Label-Efficient Data Augmentation with Video Diffusion Models for Guidewire Segmentation in Cardiac Fluoroscopy
Label-Efficient Data Augmentation with Video Diffusion Models for Guidewire Segmentation in Cardiac Fluoroscopy
Shaoyan Pan
Yikang Liu
Lin Zhao
Eric Z. Chen
Xiao Chen
Terrence Chen
Shanhui Sun
VGen
MedIm
73
0
0
20 Dec 2024
Wonderland: Navigating 3D Scenes from a Single Image
Wonderland: Navigating 3D Scenes from a Single Image
Hanwen Liang
Junli Cao
Vidit Goel
Guocheng Qian
Sergei Korolev
Demetri Terzopoulos
Konstantinos N. Plataniotis
Sergey Tulyakov
Jian Ren
VGen
122
11
0
16 Dec 2024
InterDyn: Controllable Interactive Dynamics with Video Diffusion Models
InterDyn: Controllable Interactive Dynamics with Video Diffusion Models
Rick Akkerman
Haiwen Feng
M. Black
Dimitrios Tzionas
Victoria Fernandez-Abrevaya
VGen
AI4CE
82
3
0
16 Dec 2024
SVGFusion: Scalable Text-to-SVG Generation via Vector Space Diffusion
SVGFusion: Scalable Text-to-SVG Generation via Vector Space Diffusion
Ximing Xing
Juncheng Hu
Jing Zhang
Dong Xu
Qian Yu
62
1
0
11 Dec 2024
Hallo3: Highly Dynamic and Realistic Portrait Image Animation with Video Diffusion Transformer
Hallo3: Highly Dynamic and Realistic Portrait Image Animation with Video Diffusion Transformer
Jiahao Cui
Hui Li
Yun Zhan
Hanlin Shang
K. Cheng
Yuqi Ma
Shan Mu
Hang Zhou
Jingdong Wang
Siyu Zhu
ViT
VGen
70
6
0
01 Dec 2024
SpatialDreamer: Self-supervised Stereo Video Synthesis from Monocular Input
SpatialDreamer: Self-supervised Stereo Video Synthesis from Monocular Input
Zhen Lv
Yangqi Long
Congzhentao Huang
Cao Li
Chengfei Lv
Hao Ren
Dian Zheng
DiffM
VGen
MDE
110
5
0
18 Nov 2024
On Improved Conditioning Mechanisms and Pre-training Strategies for Diffusion Models
On Improved Conditioning Mechanisms and Pre-training Strategies for Diffusion Models
Tariq Berrada Ifriqi
Pietro Astolfi
Melissa Hall
Reyhane Askari Hemmat
Yohann Benchetrit
...
Matthew Muckley
Karteek Alahari
Adriana Romero Soriano
Jakob Verbeek
M. Drozdzal
AI4CE
VLM
30
3
0
05 Nov 2024
AutoBench-V: Can Large Vision-Language Models Benchmark Themselves?
AutoBench-V: Can Large Vision-Language Models Benchmark Themselves?
Han Bao
Yue Huang
Yanbo Wang
Jiayi Ye
Xiangqi Wang
Xiuying Chen
Mohamed Elhoseiny
X. Zhang
Mohamed Elhoseiny
Xiangliang Zhang
32
7
0
28 Oct 2024
Multi-Scale Diffusion: Enhancing Spatial Layout in High-Resolution Panoramic Image Generation
Multi-Scale Diffusion: Enhancing Spatial Layout in High-Resolution Panoramic Image Generation
Xiaoyu Zhang
Teng Zhou
Xinlong Zhang
Jia Wei
Yongchuan Tang
24
1
0
24 Oct 2024
LT3SD: Latent Trees for 3D Scene Diffusion
LT3SD: Latent Trees for 3D Scene Diffusion
Quan Meng
Lei Li
Matthias Nießner
Angela Dai
85
10
0
12 Sep 2024
MarS: a Financial Market Simulation Engine Powered by Generative Foundation Model
MarS: a Financial Market Simulation Engine Powered by Generative Foundation Model
Junjie Li
Yang Liu
Weiqing Liu
Shikai Fang
Lewen Wang
Chang Xu
Jiang Bian
VGen
21
3
0
04 Sep 2024
Differentially Private Kernel Density Estimation
Differentially Private Kernel Density Estimation
Erzhi Liu
Jerry Yao-Chieh Hu
Alex Reneau
Zhao Song
Han Liu
32
3
0
03 Sep 2024
CATD: Unified Representation Learning for EEG-to-fMRI Cross-Modal Generation
CATD: Unified Representation Learning for EEG-to-fMRI Cross-Modal Generation
Weiheng Yao
Shuqiang Wang
Mufti Mahmud
Ning Zhong
Baiying Lei
Shuqiang Wang
MedIm
DiffM
17
1
0
16 Jul 2024
LiveScene: Language Embedding Interactive Radiance Fields for Physical Scene Rendering and Control
LiveScene: Language Embedding Interactive Radiance Fields for Physical Scene Rendering and Control
Delin Qu
Qizhi Chen
Pingrui Zhang
Xianqiang Gao
Bin Zhao
Bin Zhao
Dong Wang
Xuelong Li
AI4CE
25
7
0
23 Jun 2024
TerDiT: Ternary Diffusion Models with Transformers
TerDiT: Ternary Diffusion Models with Transformers
Xudong Lu
Aojun Zhou
Ziyi Lin
Qi Liu
Yuhui Xu
Renrui Zhang
Yafei Wen
Shuai Ren
Peng Gao
Junchi Yan
MQ
22
2
0
23 May 2024
COLD-Attack: Jailbreaking LLMs with Stealthiness and Controllability
COLD-Attack: Jailbreaking LLMs with Stealthiness and Controllability
Xing-ming Guo
Fangxu Yu
Huan Zhang
Lianhui Qin
Bin Hu
AAML
101
69
0
13 Feb 2024
I Think, Therefore I am: Benchmarking Awareness of Large Language Models
  Using AwareBench
I Think, Therefore I am: Benchmarking Awareness of Large Language Models Using AwareBench
Yuan Li
Yue Huang
Yuli Lin
Siyuan Wu
Yao Wan
Lichao Sun
LLMAG
ELM
32
1
0
31 Jan 2024
Lumiere: A Space-Time Diffusion Model for Video Generation
Lumiere: A Space-Time Diffusion Model for Video Generation
Omer Bar-Tal
Hila Chefer
Omer Tov
Charles Herrmann
Roni Paiss
...
T. Michaeli
Oliver Wang
Deqing Sun
Tali Dekel
Inbar Mosseri
VGen
98
90
0
23 Jan 2024
VideoCrafter2: Overcoming Data Limitations for High-Quality Video
  Diffusion Models
VideoCrafter2: Overcoming Data Limitations for High-Quality Video Diffusion Models
Haoxin Chen
Yong Zhang
Xiaodong Cun
Menghan Xia
Xintao Wang
Chao-Liang Weng
Ying Shan
VGen
DiffM
115
269
0
17 Jan 2024
RLHF-V: Towards Trustworthy MLLMs via Behavior Alignment from
  Fine-grained Correctional Human Feedback
RLHF-V: Towards Trustworthy MLLMs via Behavior Alignment from Fine-grained Correctional Human Feedback
M. Steyvers
Yuan Yao
Haoye Zhang
Taiwen He
Yifeng Han
...
Xinyue Hu
Zhiyuan Liu
Hai-Tao Zheng
Maosong Sun
Tat-Seng Chua
MLLM
VLM
122
56
0
01 Dec 2023
AlignBench: Benchmarking Chinese Alignment of Large Language Models
AlignBench: Benchmarking Chinese Alignment of Large Language Models
Xiao Liu
Xuanyu Lei
Sheng-Ping Wang
Yue Huang
Zhuoer Feng
...
Hongning Wang
Jing Zhang
Minlie Huang
Yuxiao Dong
Jie Tang
ELM
LM&MA
ALM
111
41
0
30 Nov 2023
Adversarial Diffusion Distillation
Adversarial Diffusion Distillation
Axel Sauer
Dominik Lorenz
A. Blattmann
Robin Rombach
130
138
0
28 Nov 2023
Stable Video Diffusion: Scaling Latent Video Diffusion Models to Large
  Datasets
Stable Video Diffusion: Scaling Latent Video Diffusion Models to Large Datasets
A. Blattmann
Tim Dockhorn
Sumith Kulal
Daniel Mendelevitch
Maciej Kilian
...
Zion English
Vikram S. Voleti
Adam Letts
Varun Jampani
Robin Rombach
VGen
150
985
0
25 Nov 2023
Control-A-Video: Controllable Text-to-Video Generation with Diffusion
  Models
Control-A-Video: Controllable Text-to-Video Generation with Diffusion Models
Weifeng Chen
Yatai Ji
Jie Wu
Hefeng Wu
Pan Xie
Jiashi Li
Xin Xia
Xuefeng Xiao
Liang Lin
VGen
121
6
0
23 May 2023
MDTv2: Masked Diffusion Transformer is a Strong Image Synthesizer
MDTv2: Masked Diffusion Transformer is a Strong Image Synthesizer
Shanghua Gao
Pan Zhou
Mingg-Ming Cheng
Shuicheng Yan
DiffM
116
80
0
25 Mar 2023
VideoFusion: Decomposed Diffusion Models for High-Quality Video
  Generation
VideoFusion: Decomposed Diffusion Models for High-Quality Video Generation
Zhengxiong Luo
Dayou Chen
Yingya Zhang
Yan Huang
Liangsheng Wang
Yujun Shen
Deli Zhao
Jinren Zhou
Tien-Ping Tan
DiffM
VGen
126
301
0
15 Mar 2023
DALL-E-Bot: Introducing Web-Scale Diffusion Models to Robotics
DALL-E-Bot: Introducing Web-Scale Diffusion Models to Robotics
Ivan Kapelyukh
Vitalis Vosylius
Edward Johns
LM&Ro
DiffM
91
143
0
05 Oct 2022
Fast-Vid2Vid: Spatial-Temporal Compression for Video-to-Video Synthesis
Fast-Vid2Vid: Spatial-Temporal Compression for Video-to-Video Synthesis
Long Zhuo
Guangcong Wang
Shikai Li
Wayne Wu
Ziwei Liu
VGen
42
20
0
11 Jul 2022
MaskViT: Masked Visual Pre-Training for Video Prediction
MaskViT: Masked Visual Pre-Training for Video Prediction
Agrim Gupta
Stephen Tian
Yunzhi Zhang
Jiajun Wu
Roberto Martín-Martín
Li Fei-Fei
85
110
0
23 Jun 2022
CogVideo: Large-scale Pretraining for Text-to-Video Generation via
  Transformers
CogVideo: Large-scale Pretraining for Text-to-Video Generation via Transformers
Wenyi Hong
Ming Ding
Wendi Zheng
Xinghan Liu
Jie Tang
DiffM
235
333
0
29 May 2022
Flexible Diffusion Modeling of Long Videos
Flexible Diffusion Modeling of Long Videos
William Harvey
Saeid Naderiparizi
Vaden Masrani
Christian Weilbach
Frank D. Wood
DiffM
BDL
VGen
161
213
0
23 May 2022
Planning with Diffusion for Flexible Behavior Synthesis
Planning with Diffusion for Flexible Behavior Synthesis
Michael Janner
Yilun Du
J. Tenenbaum
Sergey Levine
DiffM
190
381
0
20 May 2022
Training language models to follow instructions with human feedback
Training language models to follow instructions with human feedback
Long Ouyang
Jeff Wu
Xu Jiang
Diogo Almeida
Carroll L. Wainwright
...
Amanda Askell
Peter Welinder
Paul Christiano
Jan Leike
Ryan J. Lowe
OSLM
ALM
301
8,441
0
04 Mar 2022
Masked Autoencoders Are Scalable Vision Learners
Masked Autoencoders Are Scalable Vision Learners
Kaiming He
Xinlei Chen
Saining Xie
Yanghao Li
Piotr Dollár
Ross B. Girshick
ViT
TPM
255
5,353
0
11 Nov 2021
Multitask Prompted Training Enables Zero-Shot Task Generalization
Multitask Prompted Training Enables Zero-Shot Task Generalization
Victor Sanh
Albert Webson
Colin Raffel
Stephen H. Bach
Lintang Sutawika
...
T. Bers
Stella Biderman
Leo Gao
Thomas Wolf
Alexander M. Rush
LRM
203
1,651
0
15 Oct 2021
The Power of Scale for Parameter-Efficient Prompt Tuning
The Power of Scale for Parameter-Efficient Prompt Tuning
Brian Lester
Rami Al-Rfou
Noah Constant
VPVLM
272
3,784
0
18 Apr 2021
12
Next