ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2405.21075
  4. Cited By
Video-MME: The First-Ever Comprehensive Evaluation Benchmark of Multi-modal LLMs in Video Analysis
v1v2v3 (latest)

Video-MME: The First-Ever Comprehensive Evaluation Benchmark of Multi-modal LLMs in Video Analysis

31 May 2024
Chaoyou Fu
Yuhan Dai
Yondong Luo
Lei Li
Shuhuai Ren
Renrui Zhang
Zihan Wang
Chenyu Zhou
Chunjiang Ge
Mengdan Zhang
Peixian Chen
Yanwei Li
Shaohui Lin
Zhengye Zhang
Ke Li
Tong Xu
Xiawu Zheng
Enhong Chen
Caifeng Shan
Xing Sun
Xing Sun
    VLMMLLM
ArXiv (abs)PDFHTMLHuggingFace (25 upvotes)

Papers citing "Video-MME: The First-Ever Comprehensive Evaluation Benchmark of Multi-modal LLMs in Video Analysis"

50 / 550 papers shown
FindingDory: A Benchmark to Evaluate Memory in Embodied Agents
FindingDory: A Benchmark to Evaluate Memory in Embodied Agents
Karmesh Yadav
Yusuf Ali
Gunshi Gupta
Y. Gal
Z. Kira
LM&Ro
258
2
0
18 Jun 2025
video-SALMONN 2: Caption-Enhanced Audio-Visual Large Language Models
video-SALMONN 2: Caption-Enhanced Audio-Visual Large Language Models
Changli Tang
Yixuan Li
Yudong Yang
Jimin Zhuang
Guangzhi Sun
Wei Li
Zejun Ma
Chao Zhang
384
2
0
18 Jun 2025
Ego-R1: Chain-of-Tool-Thought for Ultra-Long Egocentric Video Reasoning
Ego-R1: Chain-of-Tool-Thought for Ultra-Long Egocentric Video Reasoning
Ziniu Zhang
Ruiqi Wang
Hongming Guo
Penghao Wu
Yuhao Dong
Xiuying Wang
Jingkang Yang
Hao Zhang
Hongyuan Zhu
Ziwei Liu
RALMLRM
264
17
0
16 Jun 2025
AdaVideoRAG: Omni-Contextual Adaptive Retrieval-Augmented Efficient Long Video Understanding
AdaVideoRAG: Omni-Contextual Adaptive Retrieval-Augmented Efficient Long Video Understanding
Zhucun Xue
Jiangning Zhang
Xurong Xie
Yuxuan Cai
Yong-Jin Liu
Xiangtai Li
Dacheng Tao
VGenVLM
371
5
0
16 Jun 2025
MambaMia: A State-Space-Model-Based Compression for Efficient Video Understanding in Large Multimodal Models
MambaMia: A State-Space-Model-Based Compression for Efficient Video Understanding in Large Multimodal Models
Geewook Kim
Minjoon Seo
239
1
0
16 Jun 2025
SmartHome-Bench: A Comprehensive Benchmark for Video Anomaly Detection in Smart Homes Using Multi-Modal Large Language Models
SmartHome-Bench: A Comprehensive Benchmark for Video Anomaly Detection in Smart Homes Using Multi-Modal Large Language Models
Xinyi Zhao
Congjing Zhang
Pei Guo
Wei Li
Lin Chen
Chaoyue Zhao
Shuai Huang
192
2
0
15 Jun 2025
Benchmarking the Trustworthiness in Multimodal LLMs for Video Understanding
Benchmarking the Trustworthiness in Multimodal LLMs for Video Understanding
Youze Wang
Zijun Chen
Ruoyu Chen
Shishen Gu
Yinpeng Dong
...
Jun Zhu
Meng Wang
Richang Hong
Wenbo Hu
Richang Hong
364
0
0
14 Jun 2025
VRBench: A Benchmark for Multi-Step Reasoning in Long Narrative Videos
VRBench: A Benchmark for Multi-Step Reasoning in Long Narrative Videos
Jiashuo Yu
Y. Wu
Meng Chu
Zhifei Ren
Z. Huang
...
Conghui He
Yu Qiao
Yali Wang
Yi Wang
L. Wang
LRM
458
8
0
12 Jun 2025
Beyond Attention or Similarity: Maximizing Conditional Diversity for Token Pruning in MLLMs
Beyond Attention or Similarity: Maximizing Conditional Diversity for Token Pruning in MLLMs
Qizhe Zhang
Mengzhen Liu
Lichen Li
Ming Lu
Yuan Zhang
Junwen Pan
Qi She
Shanghang Zhang
VLM
399
18
0
12 Jun 2025
Think With Videos For Agentic Long-Video Understanding
Think With Videos For Agentic Long-Video Understanding
Huaying Yuan
Zheng Liu
Junjie Zhou
Ji-Rong Wen
Yan Shu
Andrii Zadaianchuk
Ji-Rong Wen
Zhicheng Dou
VLM
539
1
0
12 Jun 2025
Vision Generalist Model: A Survey
Vision Generalist Model: A SurveyInternational Journal of Computer Vision (IJCV), 2025
Ziyi Wang
Yongming Rao
Shuofeng Sun
Xinrun Liu
Yi Wei
...
Zuyan Liu
Yanbo Wang
Hongmin Liu
Jie Zhou
Jiwen Lu
293
0
0
11 Jun 2025
DeepVideo-R1: Video Reinforcement Fine-Tuning via Difficulty-aware Regressive GRPO
DeepVideo-R1: Video Reinforcement Fine-Tuning via Difficulty-aware Regressive GRPO
Jinyoung Park
Jeehye Na
Jinyoung Kim
H. Kim
OffRL
358
21
0
09 Jun 2025
ARGUS: Hallucination and Omission Evaluation in Video-LLMs
ARGUS: Hallucination and Omission Evaluation in Video-LLMs
Ruchit Rawal
Reza Shirkavand
Heng-Chiao Huang
Gowthami Somepalli
Tom Goldstein
280
3
0
09 Jun 2025
MAGNET: A Multi-agent Framework for Finding Audio-Visual Needles by Reasoning over Multi-Video Haystacks
MAGNET: A Multi-agent Framework for Finding Audio-Visual Needles by Reasoning over Multi-Video Haystacks
Sanjoy Chowdhury
Mohamed Elmoghany
Yohan Abeysinghe
Mahmoud Ahmed
Sayan Nag
Salman Khan
Mohamed Elhoseiny
Dinesh Manocha
361
5
0
08 Jun 2025
A Culturally-diverse Multilingual Multimodal Video Benchmark & Model
A Culturally-diverse Multilingual Multimodal Video Benchmark & Model
Bhuiyan Sanjid Shafique
Ashmal Vayani
Muhammad Maaz
H. Rasheed
Dinura Dissanayake
...
Shiníchi Satoh
Michael Felsberg
M. Shah
Salman Khan
Fahad Shahbaz Khan
VLM
316
3
0
08 Jun 2025
Mitigating Behavioral Hallucination in Multimodal Large Language Models for Sequential Images
Mitigating Behavioral Hallucination in Multimodal Large Language Models for Sequential Images
Liangliang You
Junchi Yao
Shu Yang
Guimin Hu
Lijie Hu
Di Wang
MLLM
259
2
0
08 Jun 2025
How Important are Videos for Training Video LLMs?
How Important are Videos for Training Video LLMs?
George Lydakis
Alexander Hermans
A. Athar
Daan de Geus
Bastian Leibe
VLM
164
0
0
07 Jun 2025
CLaMR: Contextualized Late-Interaction for Multimodal Content Retrieval
CLaMR: Contextualized Late-Interaction for Multimodal Content Retrieval
David Wan
Han Wang
Elias Stengel-Eskin
Jaemin Cho
Mohit Bansal
VLM
227
2
0
06 Jun 2025
ExAct: A Video-Language Benchmark for Expert Action Analysis
ExAct: A Video-Language Benchmark for Expert Action Analysis
Han Yi
Yulu Pan
Feihong He
Xinyu Liu
Benjamin Zhang
Oluwatumininu Oguntola
Gedas Bertasius
201
1
0
06 Jun 2025
VideoMathQA: Benchmarking Mathematical Reasoning via Multimodal Understanding in Videos
VideoMathQA: Benchmarking Mathematical Reasoning via Multimodal Understanding in Videos
H. Rasheed
Abdelrahman M. Shaker
Anqi Tang
Muhammad Maaz
Ming-Hsuan Yang
Salman Khan
Fahad A Khan
AIMat
498
9
0
05 Jun 2025
APVR: Hour-Level Long Video Understanding with Adaptive Pivot Visual Information Retrieval
APVR: Hour-Level Long Video Understanding with Adaptive Pivot Visual Information Retrieval
Hong-xia Gao
Yiming Bao
Xuezhan Tu
Bin Zhong
Linan Yue
Minling Zhang
334
1
0
05 Jun 2025
MMR-V: What's Left Unsaid? A Benchmark for Multimodal Deep Reasoning in Videos
MMR-V: What's Left Unsaid? A Benchmark for Multimodal Deep Reasoning in Videos
Kejian Zhu
Zhuoran Jin
Hongbang Yuan
Jiachun Li
Shangqing Tu
Pengfei Cao
Yubo Chen
Kang Liu
Jun Zhao
VLMLRM
210
7
0
04 Jun 2025
MiMo-VL Technical Report
MiMo-VL Technical Report
Xiaomi LLM-Core Team
Zihao Yue
Zhenru Lin
Yifan Song
Weikun Wang
...
Di Zhang
Chong Ma
Chang Liu
Can Cai
Bingquan Xia
OffRLMoEVLMLRM
255
15
0
04 Jun 2025
OmniSpatial: Towards Comprehensive Spatial Reasoning Benchmark for Vision Language Models
OmniSpatial: Towards Comprehensive Spatial Reasoning Benchmark for Vision Language Models
Mengdi Jia
Zekun Qi
Shaochen Zhang
Wenyao Zhang
Xinqiang Yu
Jiawei He
He Wang
L. Yi
LRMVLM
331
28
0
03 Jun 2025
VideoCap-R1: Enhancing MLLMs for Video Captioning via Structured Thinking
VideoCap-R1: Enhancing MLLMs for Video Captioning via Structured Thinking
Desen Meng
Rui Huang
Zhilin Dai
Xinhao Li
Yifan Xu
...
Z. Huang
Meng Zhang
L. Zhang
Lu Dong
Limin Wang
OffRLVLMLRM
270
12
0
02 Jun 2025
Is Extending Modality The Right Path Towards Omni-Modality?
Is Extending Modality The Right Path Towards Omni-Modality?
Tinghui Zhu
Kai Zhang
Muhao Chen
Eric Fosler-Lussier
VLM
281
3
0
02 Jun 2025
NavBench: Probing Multimodal Large Language Models for Embodied Navigation
NavBench: Probing Multimodal Large Language Models for Embodied Navigation
Yanyuan Qiao
Haodong Hong
Wenqi Lyu
Dong An
Siqi Zhang
Yutong Xie
Xinyu Wang
Qi Wu
LM&Ro
250
4
0
01 Jun 2025
Generic Token Compression in Multimodal Large Language Models from an Explainability Perspective
Generic Token Compression in Multimodal Large Language Models from an Explainability Perspective
Lei Lei
Jie Gu
Xiaokang Ma
Chu Tang
Jingmin Chen
Tong Xu
244
1
0
01 Jun 2025
FlexSelect: Flexible Token Selection for Efficient Long Video Understanding
FlexSelect: Flexible Token Selection for Efficient Long Video Understanding
Yunzhu Zhang
Yu Lu
T. Wang
Fengyun Rao
Yi Yang
Linchao Zhu
VLM
231
7
0
01 Jun 2025
Deep Temporal Reasoning in Video Language Models: A Cross-Linguistic Evaluation of Action Duration and Completion through Perfect Times
Deep Temporal Reasoning in Video Language Models: A Cross-Linguistic Evaluation of Action Duration and Completion through Perfect TimesAnnual Meeting of the Association for Computational Linguistics (ACL), 2025
Olga Loginova
Sofía Ortega Loguinova
LRM
160
0
0
01 Jun 2025
Vid2Coach: Transforming How-To Videos into Task Assistants
Vid2Coach: Transforming How-To Videos into Task AssistantsACM Symposium on User Interface Software and Technology (UIST), 2025
Mina Huh
Zihui Xue
Ujjaini Das
Kumar Ashutosh
Kristen Grauman
Amy Pavel
246
4
0
31 May 2025
Chain-of-Frames: Advancing Video Understanding in Multimodal LLMs via Frame-Aware Reasoning
Chain-of-Frames: Advancing Video Understanding in Multimodal LLMs via Frame-Aware Reasoning
Sara Ghazanfari
Francesco Croce
Nicolas Flammarion
Prashanth Krishnamurthy
Farshad Khorrami
S. Garg
LRM
189
9
0
31 May 2025
DisTime: Distribution-based Time Representation for Video Large Language Models
DisTime: Distribution-based Time Representation for Video Large Language Models
Yingsen Zeng
Zepeng Huang
Yujie Zhong
Chengjian Feng
Jie Hu
Lin Ma
Yang Liu
VGen
254
4
0
30 May 2025
Learning from Videos for 3D World: Enhancing MLLMs with 3D Vision Geometry Priors
Learning from Videos for 3D World: Enhancing MLLMs with 3D Vision Geometry Priors
Duo Zheng
Shijia Huang
Yanyang Li
Liwei Wang
383
23
0
30 May 2025
SiLVR: A Simple Language-based Video Reasoning Framework
SiLVR: A Simple Language-based Video Reasoning Framework
Ce Zhang
Yan-Bo Lin
Ziyang Wang
Mohit Bansal
Gedas Bertasius
LRM
188
7
0
30 May 2025
Reinforcing Video Reasoning with Focused Thinking
Reinforcing Video Reasoning with Focused Thinking
Jisheng Dang
Jingze Wu
T. Wang
Xuanhui Lin
Nannan Zhu
Hongbo Chen
Wei-Shi Zheng
Meng Wang
Tat-Seng Chua
OffRLLRM
340
12
0
30 May 2025
VideoReasonBench: Can MLLMs Perform Vision-Centric Complex Video Reasoning?
VideoReasonBench: Can MLLMs Perform Vision-Centric Complex Video Reasoning?
Y. Liu
Kun Ouyang
Haoning Wu
Yi Liu
Lin Sui
Xinhao Li
Y. Zhong
Y. Charles
Xinyu Zhou
Xu Sun
VLMLRM
275
4
0
29 May 2025
VF-Eval: Evaluating Multimodal LLMs for Generating Feedback on AIGC Videos
VF-Eval: Evaluating Multimodal LLMs for Generating Feedback on AIGC VideosAnnual Meeting of the Association for Computational Linguistics (ACL), 2025
Tingyu Song
Tongyan Hu
Guo Gan
Yilun Zhao
264
0
0
29 May 2025
One Trajectory, One Token: Grounded Video Tokenization via Panoptic Sub-object Trajectory
One Trajectory, One Token: Grounded Video Tokenization via Panoptic Sub-object Trajectory
Chenhao Zheng
Jieyu Zhang
Mohammadreza Salehi
Ziqi Gao
Vishnu Iyengar
Norimasa Kobori
Quan Kong
Ranjay Krishna
378
2
0
29 May 2025
VCapsBench: A Large-scale Fine-grained Benchmark for Video Caption Quality Evaluation
VCapsBench: A Large-scale Fine-grained Benchmark for Video Caption Quality Evaluation
Shi-Xue Zhang
Hongfa Wang
Duojun Huang
Xin Li
Xiaobin Zhu
Xu-Cheng Yin
CoGe
275
5
0
29 May 2025
VidText: Towards Comprehensive Evaluation for Video Text Understanding
VidText: Towards Comprehensive Evaluation for Video Text Understanding
Zhoufaran Yang
Yan Shu
Zhifei Yang
Zhifei Yang
Yan Zhang
...
Gangyan Zeng
Gangyan Zeng
Yu Zhou
Andrii Zadaianchuk
Nicu Sebe
CoGe
350
4
0
28 May 2025
Fostering Video Reasoning via Next-Event Prediction
Fostering Video Reasoning via Next-Event Prediction
Haonan Wang
Hongfu Liu
Xiangyan Liu
C. Du
Kenji Kawaguchi
Ye Wang
Tianyu Pang
AI4TSLRM
206
4
0
28 May 2025
HoliTom: Holistic Token Merging for Fast Video Large Language Models
HoliTom: Holistic Token Merging for Fast Video Large Language Models
Kele Shao
Keda Tao
Can Qin
Haoxuan You
Yang Sui
Huan Wang
VLM
648
15
0
27 May 2025
AdaTP: Attention-Debiased Token Pruning for Video Large Language Models
AdaTP: Attention-Debiased Token Pruning for Video Large Language Models
Fengyuan Sun
Leqi Shen
Hui Chen
Sicheng Zhao
Jungong Han
Guiguang Ding
VLM
200
2
0
26 May 2025
TUNA: Comprehensive Fine-grained Temporal Understanding Evaluation on Dense Dynamic Videos
TUNA: Comprehensive Fine-grained Temporal Understanding Evaluation on Dense Dynamic VideosAnnual Meeting of the Association for Computational Linguistics (ACL), 2025
Fanheng Kong
Jingyuan Zhang
Hongzhi Zhang
Shi Feng
Daling Wang
Linhao Yu
Xingguang Ji
Yu Tian
Qi Wang
Fuzheng Zhang
292
2
0
26 May 2025
Two Causally Related Needles in a Video Haystack
Two Causally Related Needles in a Video Haystack
Miaoyu Li
Qin Chao
Boyang Albert Li
CML
301
0
0
26 May 2025
VerIPO: Cultivating Long Reasoning in Video-LLMs via Verifier-Gudied Iterative Policy Optimization
VerIPO: Cultivating Long Reasoning in Video-LLMs via Verifier-Gudied Iterative Policy Optimization
Yunxin Li
Xinyu Chen
Zitao Li
Zhenyu Liu
L. Wang
Tong Lu
Baotian Hu
Min Zhang
OffRLLRM
398
8
0
25 May 2025
RTime-QA: A Benchmark for Atomic Temporal Event Understanding in Large Multi-modal Models
RTime-QA: A Benchmark for Atomic Temporal Event Understanding in Large Multi-modal Models
Yuqi Liu
Qin Jin
Tianyuan Qu
Xuan Liu
Yang Du
Bei Yu
Jiaya Jia
397
0
0
25 May 2025
Sparse-to-Dense: A Free Lunch for Lossless Acceleration of Video Understanding in LLMs
Sparse-to-Dense: A Free Lunch for Lossless Acceleration of Video Understanding in LLMsAnnual Meeting of the Association for Computational Linguistics (ACL), 2025
Xuan Zhang
Cunxiao Du
Sicheng Yu
Jiawei Wu
Fengzhuo Zhang
Wei Gao
Qian Liu
232
1
0
25 May 2025
ToDRE: Effective Visual Token Pruning via Token Diversity and Task Relevance
ToDRE: Effective Visual Token Pruning via Token Diversity and Task Relevance
Duo Li
Zuhao Yang
Xiaoqin Zhang
Ling Shao
Shijian Lu
VLM
494
1
0
24 May 2025
Previous
123...567...91011
Next