ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2311.17005
  4. Cited By
MVBench: A Comprehensive Multi-modal Video Understanding Benchmark

MVBench: A Comprehensive Multi-modal Video Understanding Benchmark

28 November 2023
Kunchang Li
Yali Wang
Yinan He
Yizhuo Li
Yi Wang
Yi Liu
Zun Wang
Jilan Xu
Guo Chen
Ping Luo
Limin Wang
Yu Qiao
    VLM
    MLLM
ArXivPDFHTML

Papers citing "MVBench: A Comprehensive Multi-modal Video Understanding Benchmark"

50 / 312 papers shown
Title
Towards Retrieval Augmented Generation over Large Video Libraries
Towards Retrieval Augmented Generation over Large Video Libraries
Yannis Tevissen
Khalil Guetari
Frédéric Petitpont
RALM
30
2
0
21 Jun 2024
MMBench-Video: A Long-Form Multi-Shot Benchmark for Holistic Video
  Understanding
MMBench-Video: A Long-Form Multi-Shot Benchmark for Holistic Video Understanding
Xinyu Fang
Kangrui Mao
Haodong Duan
Xiangyu Zhao
Yining Li
Dahua Lin
Kai Chen
VLM
49
61
0
20 Jun 2024
Towards Event-oriented Long Video Understanding
Towards Event-oriented Long Video Understanding
Yifan Du
Kun Zhou
Yuqi Huo
Yifan Li
Wayne Xin Zhao
Haoyu Lu
Zijia Zhao
Bingning Wang
Weipeng Chen
Ji-Rong Wen
VLM
19
13
0
20 Jun 2024
VideoVista: A Versatile Benchmark for Video Understanding and Reasoning
VideoVista: A Versatile Benchmark for Video Understanding and Reasoning
Yunxin Li
Xinyu Chen
Baotian Hu
Longyue Wang
Haoyuan Shi
Min-Ling Zhang
MLLM
LRM
38
25
0
17 Jun 2024
Holistic-Motion2D: Scalable Whole-body Human Motion Generation in 2D
  Space
Holistic-Motion2D: Scalable Whole-body Human Motion Generation in 2D Space
Yuan Wang
Zhao Wang
Junhao Gong
Di Huang
Tong He
...
J. Jiao
Xuetao Feng
Qi Dou
Shixiang Tang
Dan Xu
30
3
0
17 Jun 2024
Emotion-LLaMA: Multimodal Emotion Recognition and Reasoning with
  Instruction Tuning
Emotion-LLaMA: Multimodal Emotion Recognition and Reasoning with Instruction Tuning
Zebang Cheng
Zhi-Qi Cheng
Jun-Yan He
Jingdong Sun
Kai Wang
Yuxiang Lin
Zheng Lian
Xiaojiang Peng
Alexander G. Hauptmann
MLLM
29
28
0
17 Jun 2024
Beyond Raw Videos: Understanding Edited Videos with Large Multimodal
  Model
Beyond Raw Videos: Understanding Edited Videos with Large Multimodal Model
Lu Xu
Sijie Zhu
Chunyuan Li
Chia-Wen Kuo
Fan Chen
Xinyao Wang
Guang Chen
Dawei Du
Ye Yuan
Longyin Wen
30
4
0
15 Jun 2024
GPT-4o: Visual perception performance of multimodal large language
  models in piglet activity understanding
GPT-4o: Visual perception performance of multimodal large language models in piglet activity understanding
Yiqi Wu
Xiaodan Hu
Ziming Fu
Siling Zhou
Jiangong Li
MLLM
22
9
0
14 Jun 2024
VANE-Bench: Video Anomaly Evaluation Benchmark for Conversational LMMs
VANE-Bench: Video Anomaly Evaluation Benchmark for Conversational LMMs
Rohit K Bharadwaj
Hanan Gani
Muzammal Naseer
F. Khan
Salman Khan
47
3
0
14 Jun 2024
VideoGPT+: Integrating Image and Video Encoders for Enhanced Video
  Understanding
VideoGPT+: Integrating Image and Video Encoders for Enhanced Video Understanding
Muhammad Maaz
H. Rasheed
Salman Khan
Fahad A Khan
VLM
MLLM
19
49
0
13 Jun 2024
Explore the Limits of Omni-modal Pretraining at Scale
Explore the Limits of Omni-modal Pretraining at Scale
Yiyuan Zhang
Handong Li
Jing Liu
Xiangyu Yue
VLM
LRM
38
1
0
13 Jun 2024
INS-MMBench: A Comprehensive Benchmark for Evaluating LVLMs' Performance
  in Insurance
INS-MMBench: A Comprehensive Benchmark for Evaluating LVLMs' Performance in Insurance
Chenwei Lin
Hanjia Lyu
Xian Xu
Jiebo Luo
27
1
0
13 Jun 2024
Needle In A Video Haystack: A Scalable Synthetic Evaluator for Video MLLMs
Needle In A Video Haystack: A Scalable Synthetic Evaluator for Video MLLMs
Zijia Zhao
Haoyu Lu
Yuqi Huo
Yifan Du
Tongtian Yue
Longteng Guo
Bingning Wang
Weipeng Chen
Jing Liu
31
2
0
13 Jun 2024
MMWorld: Towards Multi-discipline Multi-faceted World Model Evaluation
  in Videos
MMWorld: Towards Multi-discipline Multi-faceted World Model Evaluation in Videos
Xuehai He
Weixi Feng
Kaizhi Zheng
Yujie Lu
Wanrong Zhu
...
Zhengyuan Yang
Kevin Lin
William Yang Wang
Lijuan Wang
Xin Eric Wang
VGen
LRM
33
12
0
12 Jun 2024
LVBench: An Extreme Long Video Understanding Benchmark
LVBench: An Extreme Long Video Understanding Benchmark
Weihan Wang
Zehai He
Wenyi Hong
Yean Cheng
Xiaohan Zhang
...
Shiyu Huang
Bin Xu
Yuxiao Dong
Ming Ding
Jie Tang
ELM
VLM
38
63
0
12 Jun 2024
Fewer Tokens and Fewer Videos: Extending Video Understanding Abilities
  in Large Vision-Language Models
Fewer Tokens and Fewer Videos: Extending Video Understanding Abilities in Large Vision-Language Models
Shimin Chen
Yitian Yuan
Shaoxiang Chen
Zequn Jie
Lin Ma
VLM
24
3
0
12 Jun 2024
Needle In A Multimodal Haystack
Needle In A Multimodal Haystack
Weiyun Wang
Shuibo Zhang
Yiming Ren
Yuchen Duan
Tiantong Li
...
Ping Luo
Yu Qiao
Jifeng Dai
Wenqi Shao
Wenhai Wang
VLM
57
16
0
11 Jun 2024
Vript: A Video Is Worth Thousands of Words
Vript: A Video Is Worth Thousands of Words
Dongjie Yang
Suyuan Huang
Chengqiang Lu
Xiaodong Han
Haoxin Zhang
Yan Gao
Yao Hu
Hai Zhao
VGen
55
21
0
10 Jun 2024
ShareGPT4Video: Improving Video Understanding and Generation with Better
  Captions
ShareGPT4Video: Improving Video Understanding and Generation with Better Captions
Lin Chen
Xilin Wei
Jinsong Li
Xiaoyi Dong
Pan Zhang
...
Li Yuan
Yu Qiao
Dahua Lin
Feng Zhao
Jiaqi Wang
69
138
0
06 Jun 2024
Video-MME: The First-Ever Comprehensive Evaluation Benchmark of
  Multi-modal LLMs in Video Analysis
Video-MME: The First-Ever Comprehensive Evaluation Benchmark of Multi-modal LLMs in Video Analysis
Chaoyou Fu
Yuhan Dai
Yondong Luo
Lei Li
Shuhuai Ren
...
Tong Bill Xu
Xiawu Zheng
Enhong Chen
Rongrong Ji
Xing Sun
VLM
MLLM
41
216
0
31 May 2024
MotionLLM: Understanding Human Behaviors from Human Motions and Videos
MotionLLM: Understanding Human Behaviors from Human Motions and Videos
Ling-Hao Chen
Shunlin Lu
Ailing Zeng
Hao Zhang
Benyou Wang
Ruimao Zhang
Lei Zhang
45
33
0
30 May 2024
A Survey of Multimodal Large Language Model from A Data-centric
  Perspective
A Survey of Multimodal Large Language Model from A Data-centric Perspective
Tianyi Bai
Hao Liang
Binwang Wan
Yanran Xu
Xi Li
...
Ping-Chia Huang
Jiulong Shan
Conghui He
Binhang Yuan
Wentao Zhang
47
31
0
26 May 2024
CinePile: A Long Video Question Answering Dataset and Benchmark
CinePile: A Long Video Question Answering Dataset and Benchmark
Ruchit Rawal
Khalid Saifullah
Ronen Basri
David Jacobs
Gowthami Somepalli
Tom Goldstein
38
39
0
14 May 2024
How Good is my Video LMM? Complex Video Reasoning and Robustness
  Evaluation Suite for Video-LMMs
How Good is my Video LMM? Complex Video Reasoning and Robustness Evaluation Suite for Video-LMMs
Muhammad Uzair Khattak
Muhammad Ferjad Naeem
Jameel Hassan
Muzammal Naseer
Federico Tombari
Fahad Shahbaz Khan
Salman Khan
LRM
ELM
32
10
0
06 May 2024
MANTIS: Interleaved Multi-Image Instruction Tuning
MANTIS: Interleaved Multi-Image Instruction Tuning
Dongfu Jiang
Xuan He
Huaye Zeng
Cong Wei
Max W.F. Ku
Qian Liu
Wenhu Chen
VLM
MLLM
28
32
0
02 May 2024
MileBench: Benchmarking MLLMs in Long Context
MileBench: Benchmarking MLLMs in Long Context
Dingjie Song
Shunian Chen
Guiming Hardy Chen
Fei Yu
Xiang Wan
Benyou Wang
VLM
61
34
0
29 Apr 2024
MER 2024: Semi-Supervised Learning, Noise Robustness, and
  Open-Vocabulary Multimodal Emotion Recognition
MER 2024: Semi-Supervised Learning, Noise Robustness, and Open-Vocabulary Multimodal Emotion Recognition
Zheng Lian
Haiyang Sun
Licai Sun
Zhuofan Wen
Siyuan Zhang
...
Bin Liu
Erik Cambria
Guoying Zhao
Björn W. Schuller
Jianhua Tao
VLM
31
11
0
26 Apr 2024
PLLaVA : Parameter-free LLaVA Extension from Images to Videos for Video
  Dense Captioning
PLLaVA : Parameter-free LLaVA Extension from Images to Videos for Video Dense Captioning
Lin Xu
Yilin Zhao
Daquan Zhou
Zhijie Lin
See Kiong Ng
Jiashi Feng
MLLM
VLM
34
108
0
25 Apr 2024
Energy-Latency Manipulation of Multi-modal Large Language Models via
  Verbose Samples
Energy-Latency Manipulation of Multi-modal Large Language Models via Verbose Samples
Kuofeng Gao
Jindong Gu
Yang Bai
Shu-Tao Xia
Philip H. S. Torr
Wei Liu
Zhifeng Li
56
11
0
25 Apr 2024
Pegasus-v1 Technical Report
Pegasus-v1 Technical Report
Raehyuk Jung
Hyojun Go
Jaehyuk Yi
Jiho Jang
Daniel Kim
...
Maninder Saini
Meredith Sanders
Soyoung Lee
Sue Kim
Travis Couture
MLLM
VLM
26
5
0
23 Apr 2024
Movie101v2: Improved Movie Narration Benchmark
Movie101v2: Improved Movie Narration Benchmark
Zihao Yue
Yepeng Zhang
Ziheng Wang
Qin Jin
VGen
19
1
0
20 Apr 2024
From Image to Video, what do we need in multimodal LLMs?
From Image to Video, what do we need in multimodal LLMs?
Suyuan Huang
Haoxin Zhang
Yan Gao
Yao Hu
Zengchang Qin
VLM
34
8
0
18 Apr 2024
CausalChaos! Dataset for Comprehensive Causal Action Question Answering
  Over Longer Causal Chains Grounded in Dynamic Visual Scenes
CausalChaos! Dataset for Comprehensive Causal Action Question Answering Over Longer Causal Chains Grounded in Dynamic Visual Scenes
Paritosh Parmar
Eric Peh
Ruirui Chen
Ting En Lam
Yuhan Chen
Elston Tan
Basura Fernando
CML
22
7
0
01 Apr 2024
Direct Preference Optimization of Video Large Multimodal Models from
  Language Model Reward
Direct Preference Optimization of Video Large Multimodal Models from Language Model Reward
Ruohong Zhang
Liangke Gui
Zhiqing Sun
Yihao Feng
Keyang Xu
...
Di Fu
Chunyuan Li
Alexander G. Hauptmann
Yonatan Bisk
Yiming Yang
MLLM
41
57
0
01 Apr 2024
ST-LLM: Large Language Models Are Effective Temporal Learners
ST-LLM: Large Language Models Are Effective Temporal Learners
Ruyang Liu
Chen Li
Haoran Tang
Yixiao Ge
Ying Shan
Ge Li
27
68
0
30 Mar 2024
An Image Grid Can Be Worth a Video: Zero-shot Video Question Answering
  Using a VLM
An Image Grid Can Be Worth a Video: Zero-shot Video Question Answering Using a VLM
Wonkyun Kim
Changin Choi
Wonseok Lee
Wonjong Rhee
VLM
40
50
0
27 Mar 2024
InternVideo2: Scaling Video Foundation Models for Multimodal Video
  Understanding
InternVideo2: Scaling Video Foundation Models for Multimodal Video Understanding
Yi Wang
Kunchang Li
Xinhao Li
Jiashuo Yu
Yinan He
...
Hongjie Zhang
Yifei Huang
Yu Qiao
Yali Wang
Limin Wang
24
104
0
22 Mar 2024
HawkEye: Training Video-Text LLMs for Grounding Text in Videos
HawkEye: Training Video-Text LLMs for Grounding Text in Videos
Yueqian Wang
Xiaojun Meng
Jianxin Liang
Yuxuan Wang
Qun Liu
Dongyan Zhao
20
30
0
15 Mar 2024
CAT: Enhancing Multimodal Large Language Model to Answer Questions in
  Dynamic Audio-Visual Scenarios
CAT: Enhancing Multimodal Large Language Model to Answer Questions in Dynamic Audio-Visual Scenarios
Qilang Ye
Zitong Yu
Rui Shao
Xinyu Xie
Philip H. S. Torr
Xiaochun Cao
MLLM
30
24
0
07 Mar 2024
TempCompass: Do Video LLMs Really Understand Videos?
TempCompass: Do Video LLMs Really Understand Videos?
Yuanxin Liu
Shicheng Li
Yi Liu
Yuxiang Wang
Shuhuai Ren
Lei Li
Sishuo Chen
Xu Sun
Lu Hou
VLM
41
98
0
01 Mar 2024
TV-TREES: Multimodal Entailment Trees for Neuro-Symbolic Video Reasoning
TV-TREES: Multimodal Entailment Trees for Neuro-Symbolic Video Reasoning
Kate Sanders
Nathaniel Weir
Benjamin Van Durme
LRM
31
11
0
29 Feb 2024
Slot-VLM: SlowFast Slots for Video-Language Modeling
Slot-VLM: SlowFast Slots for Video-Language Modeling
Jiaqi Xu
Cuiling Lan
Wenxuan Xie
Xuejin Chen
Yan Lu
MLLM
VLM
32
7
0
20 Feb 2024
LVCHAT: Facilitating Long Video Comprehension
LVCHAT: Facilitating Long Video Comprehension
Yu-Xiang Wang
Zeyuan Zhang
Julian McAuley
Zexue He
VLM
26
2
0
19 Feb 2024
Question-Instructed Visual Descriptions for Zero-Shot Video Question
  Answering
Question-Instructed Visual Descriptions for Zero-Shot Video Question Answering
David Romero
Thamar Solorio
96
1
0
16 Feb 2024
SPHINX-X: Scaling Data and Parameters for a Family of Multi-modal Large Language Models
SPHINX-X: Scaling Data and Parameters for a Family of Multi-modal Large Language Models
Chris Liu
Renrui Zhang
Longtian Qiu
Siyuan Huang
Weifeng Lin
...
Hao Shao
Pan Lu
Hongsheng Li
Yu Qiao
Peng Gao
MLLM
120
106
0
08 Feb 2024
CREMA: Generalizable and Efficient Video-Language Reasoning via Multimodal Modular Fusion
CREMA: Generalizable and Efficient Video-Language Reasoning via Multimodal Modular Fusion
Shoubin Yu
Jaehong Yoon
Mohit Bansal
67
4
0
08 Feb 2024
Video-LaVIT: Unified Video-Language Pre-training with Decoupled
  Visual-Motional Tokenization
Video-LaVIT: Unified Video-Language Pre-training with Decoupled Visual-Motional Tokenization
Yang Jin
Zhicheng Sun
Kun Xu
Kun Xu
Liwei Chen
...
Yuliang Liu
Di Zhang
Yang Song
Kun Gai
Yadong Mu
VGen
47
42
0
05 Feb 2024
Learning to Visually Connect Actions and their Effects
Learning to Visually Connect Actions and their Effects
Eric Peh
Paritosh Parmar
Basura Fernando
22
2
0
19 Jan 2024
LightHouse: A Survey of AGI Hallucination
LightHouse: A Survey of AGI Hallucination
Feng Wang
LRM
HILM
VLM
19
3
0
08 Jan 2024
Video Understanding with Large Language Models: A Survey
Video Understanding with Large Language Models: A Survey
Yunlong Tang
Jing Bi
Siting Xu
Luchuan Song
Susan Liang
...
Feng Zheng
Jianguo Zhang
Ping Luo
Jiebo Luo
Chenliang Xu
VLM
47
76
0
29 Dec 2023
Previous
1234567
Next