ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2405.21075
  4. Cited By
Video-MME: The First-Ever Comprehensive Evaluation Benchmark of Multi-modal LLMs in Video Analysis
v1v2v3 (latest)

Video-MME: The First-Ever Comprehensive Evaluation Benchmark of Multi-modal LLMs in Video Analysis

31 May 2024
Chaoyou Fu
Yuhan Dai
Yondong Luo
Lei Li
Shuhuai Ren
Renrui Zhang
Zihan Wang
Chenyu Zhou
Chunjiang Ge
Mengdan Zhang
Peixian Chen
Yanwei Li
Shaohui Lin
Zhengye Zhang
Ke Li
Tong Xu
Xiawu Zheng
Enhong Chen
Caifeng Shan
Xing Sun
Xing Sun
    VLMMLLM
ArXiv (abs)PDFHTMLHuggingFace (25 upvotes)

Papers citing "Video-MME: The First-Ever Comprehensive Evaluation Benchmark of Multi-modal LLMs in Video Analysis"

50 / 550 papers shown
JRDB-Reasoning: A Difficulty-Graded Benchmark for Visual Reasoning in Robotics
JRDB-Reasoning: A Difficulty-Graded Benchmark for Visual Reasoning in Robotics
Simindokht Jahangard
Mehrzad Mohammadi
Yi Shen
Zhixi Cai
Hamid Rezatofighi
295
2
0
14 Aug 2025
Empowering Multimodal LLMs with External Tools: A Comprehensive Survey
Empowering Multimodal LLMs with External Tools: A Comprehensive Survey
Wenbin An
Jiahao Nie
Yaqiang Wu
Feng Tian
Shijian Lu
Q. Zheng
MLLM
188
1
0
14 Aug 2025
HumanSense: From Multimodal Perception to Empathetic Context-Aware Responses through Reasoning MLLMs
HumanSense: From Multimodal Perception to Empathetic Context-Aware Responses through Reasoning MLLMsEuropean Workshop on Visual Information Processing (EUVIP), 2025
Zheng Qin
Ruobing Zheng
Yabing Wang
Tianqi Li
Yi Yuan
Jingdong Chen
Le Wang
LRM
240
2
0
14 Aug 2025
MM-BrowseComp: A Comprehensive Benchmark for Multimodal Browsing Agents
MM-BrowseComp: A Comprehensive Benchmark for Multimodal Browsing Agents
Shilong Li
Xingyuan Bu
Wenjie Wang
Jiaheng Liu
Jun Dong
...
Wenhao Huang
Wangchunshu Zhou
Zhaoxiang Zhang
Ruizhe Ding
Shilei Wen
LLMAGLRM
319
6
0
14 Aug 2025
LLMC+: Benchmarking Vision-Language Model Compression with a Plug-and-play Toolkit
LLMC+: Benchmarking Vision-Language Model Compression with a Plug-and-play Toolkit
Chengtao Lv
Bilang Zhang
Yang Yong
Yazhe Niu
Yushi Huang
Shiqiao Gu
Jiajun Wu
Yumeng Shi
Jinyang Guo
Wenya Wang
MLLMVLM
172
0
0
13 Aug 2025
Episodic Memory Representation for Long-form Video Understanding
Episodic Memory Representation for Long-form Video Understanding
Yun Wang
Long Zhang
Jingren Liu
Jiaqi Yan
Zhanjie Zhang
Jiahao Zheng
Xun Yang
Dapeng Wu
Xiangyu Chen
Xuelong Li
145
4
0
13 Aug 2025
Seeing, Listening, Remembering, and Reasoning: A Multimodal Agent with Long-Term Memory
Seeing, Listening, Remembering, and Reasoning: A Multimodal Agent with Long-Term Memory
Lin Long
Yexiao He
Wentao Ye
Yiyuan Pan
Yuan Lin
Hang Li
Junbo Zhao
Wei Li
363
9
0
13 Aug 2025
KFFocus: Highlighting Keyframes for Enhanced Video Understanding
KFFocus: Highlighting Keyframes for Enhanced Video Understanding
Ming-Jun Nie
Chunwei Wang
Hang Xu
Li Zhang
VGen
106
0
0
12 Aug 2025
Never Compromise to Vulnerabilities: A Comprehensive Survey on AI Governance
Never Compromise to Vulnerabilities: A Comprehensive Survey on AI Governance
Yuchu Jiang
Jian Zhao
Yuchen Yuan
Tianle Zhang
Yao Huang
...
Ya Zhang
Shuicheng Yan
Chi Zhang
Z. He
Xuelong Li
SILM
469
3
0
12 Aug 2025
MME-Emotion: A Holistic Evaluation Benchmark for Emotional Intelligence in Multimodal Large Language Models
MME-Emotion: A Holistic Evaluation Benchmark for Emotional Intelligence in Multimodal Large Language Models
Fan Zhang
Minghan Li
Chong Deng
Xue Yang
Zheng Lian
...
Xian Wu
Kun Wang
Xiangang Li
Jieping Ye
Pheng-Ann Heng
AI4MH
168
4
0
11 Aug 2025
AURA: A Fine-Grained Benchmark and Decomposed Metric for Audio-Visual Reasoning
AURA: A Fine-Grained Benchmark and Decomposed Metric for Audio-Visual Reasoning
Siminfar Samakoush Galougah
Rishie Raj
Sanjoy Chowdhury
Sayan Nag
Ramani Duraiswami
195
4
0
10 Aug 2025
VSI: Visual Subtitle Integration for Keyframe Selection to enhance Long Video Understanding
VSI: Visual Subtitle Integration for Keyframe Selection to enhance Long Video Understanding
Jianxiang He
Shaoguang Wang
Weiyu Guo
Ziyang Chen
Ziyang Chen
Yijie Xu
Ziyang Chen
227
0
0
09 Aug 2025
VFlowOpt: A Token Pruning Framework for LMMs with Visual Information Flow-Guided Optimization
VFlowOpt: A Token Pruning Framework for LMMs with Visual Information Flow-Guided Optimization
Sihan Yang
Runsen Xu
Chenhang Cui
Tai Wang
Dahua Lin
Jiangmiao Pang
145
3
0
07 Aug 2025
A Metric for MLLM Alignment in Large-scale Recommendation
A Metric for MLLM Alignment in Large-scale Recommendation
Yubin Zhang
Yanhua Huang
Haiming Xu
Mingliang Qi
Chang Wang
Jiarui Jin
Xiangyuan Ren
Xiaodan Wang
Ruiwen Xu
OffRL
129
0
0
07 Aug 2025
Thinking With Videos: Multimodal Tool-Augmented Reinforcement Learning for Long Video Reasoning
Thinking With Videos: Multimodal Tool-Augmented Reinforcement Learning for Long Video Reasoning
H. Zhang
Xin Gu
Jiawen Li
Chixiang Ma
Sule Bai
Chubin Zhang
Bowen Zhang
Zhichao Zhou
Dongliang He
Yansong Tang
OffRLLRM
214
29
0
06 Aug 2025
Training-Free Multimodal Large Language Model Orchestration
Training-Free Multimodal Large Language Model Orchestration
Tianyu Xie
Yuhang Wu
Yongdong Luo
Jinfa Huang
Xiawu Zheng
163
0
0
06 Aug 2025
TSPO: Temporal Sampling Policy Optimization for Long-form Video Language Understanding
TSPO: Temporal Sampling Policy Optimization for Long-form Video Language Understanding
Canhui Tang
Zifan Han
Hongbo Sun
Sanping Zhou
Xuchong Zhang
Xin Wei
Ye Yuan
Huayu Zhang
Jinglin Xu
Hao Sun
397
6
0
06 Aug 2025
AVATAR: Reinforcement Learning to See, Hear, and Reason Over Video
AVATAR: Reinforcement Learning to See, Hear, and Reason Over Video
Yogesh Kulkarni
Pooyan Fazli
OffRLLRM
308
4
0
05 Aug 2025
VideoForest: Person-Anchored Hierarchical Reasoning for Cross-Video Question Answering
VideoForest: Person-Anchored Hierarchical Reasoning for Cross-Video Question Answering
Yiran Meng
Junhong Ye
Wei Zhou
Guanghui Yue
Xudong Mao
Ruomei Wang
Baoquan Zhao
121
0
0
05 Aug 2025
Enhancing Long Video Question Answering with Scene-Localized Frame Grouping
Enhancing Long Video Question Answering with Scene-Localized Frame Grouping
Xuyi Yang
Wenhao Zhang
Hongbo Jin
Lin Liu
Hongbo Xu
Yongwei Nie
Fei Richard Yu
Fei Ma
195
1
0
05 Aug 2025
VLM4D: Towards Spatiotemporal Awareness in Vision Language Models
VLM4D: Towards Spatiotemporal Awareness in Vision Language Models
Shijie Zhou
Alexander Vilesov
Xuehai He
Ziyu Wan
Shuwang Zhang
Aditya Nagachandra
Di Chang
DongDong Chen
Xin Eric Wang
A. Kadambi
VLM
192
0
0
04 Aug 2025
Free-MoRef: Instantly Multiplexing Context Perception Capabilities of Video-MLLMs within Single Inference
Free-MoRef: Instantly Multiplexing Context Perception Capabilities of Video-MLLMs within Single Inference
Kuo Wang
Quanlong Zheng
Junlin Xie
Yanhao Zhang
Jinguo Luo
Haonan Lu
Guanbin Li
Fan Zhou
Guanbin Li
VLM
106
1
0
04 Aug 2025
TimeExpert: An Expert-Guided Video LLM for Video Temporal Grounding
TimeExpert: An Expert-Guided Video LLM for Video Temporal Grounding
Zuhao Yang
Yingchen Yu
Yunqing Zhao
Shijian Lu
Song Bai
136
2
0
03 Aug 2025
StreamAgent: Towards Anticipatory Agents for Streaming Video Understanding
StreamAgent: Towards Anticipatory Agents for Streaming Video Understanding
Haolin Yang
Feilong Tang
Linxiao Zhao
Xiang An
Ming Hu
...
Yifan Lu
Xiaofeng Zhang
Abdalla Swikir
Junjun He
Zongyuan Ge
362
5
0
03 Aug 2025
Bidirectional Likelihood Estimation with Multi-Modal Large Language Models for Text-Video Retrieval
Bidirectional Likelihood Estimation with Multi-Modal Large Language Models for Text-Video Retrieval
Dohwan Ko
Ji Soo Lee
M. Choi
Zihang Meng
Hyunwoo J. Kim
384
1
0
31 Jul 2025
ISO-Bench: Benchmarking Multimodal Causal Reasoning in Visual-Language Models through Procedural Plans
ISO-Bench: Benchmarking Multimodal Causal Reasoning in Visual-Language Models through Procedural Plans
Ananya Sadana
Yash Kumar Lal
Jiawei Zhou
CMLVLM
170
0
0
30 Jul 2025
ReGATE: Learning Faster and Better with Fewer Tokens in MLLMs
ReGATE: Learning Faster and Better with Fewer Tokens in MLLMs
Chaoyu Li
Yogesh Kulkarni
Pooyan Fazli
214
0
0
29 Jul 2025
A Survey of Token Compression for Efficient Multimodal Large Language Models
A Survey of Token Compression for Efficient Multimodal Large Language Models
Kele Shao
Keda Tao
Kejia Zhang
Sicheng Feng
Mu Cai
Yuzhang Shang
Haoxuan You
Can Qin
Yang Sui
Huan Wang
522
12
0
27 Jul 2025
MCIF: Multimodal Crosslingual Instruction-Following Benchmark from Scientific Talks
MCIF: Multimodal Crosslingual Instruction-Following Benchmark from Scientific Talks
Sara Papi
Maike Züfle
Marco Gaido
Beatrice Savoldi
Danni Liu
Ioannis Douros
L. Bentivogli
Jan Niehues
300
4
0
25 Jul 2025
EgoExoBench: A Benchmark for First- and Third-person View Video Understanding in MLLMs
EgoExoBench: A Benchmark for First- and Third-person View Video Understanding in MLLMs
Yuping He
Yifei Huang
Guo Chen
Baoqi Pei
Jilan Xu
Tong Lu
Jiangmiao Pang
EgoV
240
10
0
24 Jul 2025
Toward Scalable Video Narration: A Training-free Approach Using Multimodal Large Language Models
Toward Scalable Video Narration: A Training-free Approach Using Multimodal Large Language Models
Tz-Ying Wu
Tahani Trigui
S. N. Sridhar
Anand Bodas
Subarna Tripathi
105
1
0
22 Jul 2025
CausalStep: A Benchmark for Explicit Stepwise Causal Reasoning in Videos
CausalStep: A Benchmark for Explicit Stepwise Causal Reasoning in Videos
Xuchen Li
Xuzhao Li
Shiyu Hu
Kaiqi Huang
Wentao Zhang
CMLELMLRM
268
5
0
22 Jul 2025
Towards Video Thinking Test: A Holistic Benchmark for Advanced Video Reasoning and Understanding
Towards Video Thinking Test: A Holistic Benchmark for Advanced Video Reasoning and Understanding
Yuanhan Zhang
Yunice Chew
Yuhao Dong
Aria Leo
Bo Hu
Yu Qiao
ELM
201
4
0
20 Jul 2025
Infinite Video Understanding
Infinite Video Understanding
Dell Zhang
Xiangyu Chen
Jixiang Luo
Mengxi Jia
Changzhi Sun
Ruilong Ren
Jingren Liu
Hao Sun
Xuelong Li
VLM
250
1
0
11 Jul 2025
Scaling RL to Long Videos
Scaling RL to Long Videos
Yukang Chen
Wei Huang
Baifeng Shi
Qinghao Hu
Hanrong Ye
...
Xiaojuan Qi
Sifei Liu
Hongxu Yin
Yao Lu
Song Han
OffRLAI4TSVLMLRM
427
38
0
10 Jul 2025
Video-RTS: Rethinking Reinforcement Learning and Test-Time Scaling for Efficient and Enhanced Video Reasoning
Video-RTS: Rethinking Reinforcement Learning and Test-Time Scaling for Efficient and Enhanced Video Reasoning
Ziyang Wang
Jaehong Yoon
Shoubin Yu
Md. Mohaiminul Islam
Gedas Bertasius
Mohit Bansal
OffRLLRM
281
6
0
09 Jul 2025
Spatio-Temporal LLM: Reasoning about Environments and Actions
Spatio-Temporal LLM: Reasoning about Environments and Actions
Haozhen Zheng
Beitong Tian
Mingyuan Wu
Zhenggang Tang
Klara Nahrstedt
Alex Schwing
LRM
212
3
0
07 Jul 2025
HumanVideo-MME: Benchmarking MLLMs for Human-Centric Video Understanding
HumanVideo-MME: Benchmarking MLLMs for Human-Centric Video Understanding
Rui Yu
J. Zhang
Zhenye Gan
Qingdong He
Xiaobin Hu
...
Chengjie Wang
Zhucun Xue
Chaoyou Fu
Xinwei He
Xiang Bai
VLM
138
0
0
07 Jul 2025
Animation Needs Attention: A Holistic Approach to Slides Animation Comprehension with Visual-Language Models
Animation Needs Attention: A Holistic Approach to Slides Animation Comprehension with Visual-Language Models
Yifan Jiang
Yibo Xue
Yukun Kang
Pin Zheng
Jian Peng
Feiran Wu
Changliang Xu
DiffMVGen
258
0
0
05 Jul 2025
UVLM: Benchmarking Video Language Model for Underwater World Understanding
UVLM: Benchmarking Video Language Model for Underwater World Understanding
Xizhe Xue
Yang Zhou
Dawei Yan
Lijie Tao
Junjie Li
Ying Li
Haokui Zhang
Rong Xiao
VLM
197
2
0
03 Jul 2025
AuroraLong: Bringing RNNs Back to Efficient Open-Ended Video Understanding
AuroraLong: Bringing RNNs Back to Efficient Open-Ended Video Understanding
Weili Xu
Enxin Song
Wenhao Chai
Xuexiang Wen
Tian-Chun Ye
Gaoang Wang
341
5
0
03 Jul 2025
GLM-4.5V and GLM-4.1V-Thinking: Towards Versatile Multimodal Reasoning with Scalable Reinforcement Learning
GLM-4.5V and GLM-4.1V-Thinking: Towards Versatile Multimodal Reasoning with Scalable Reinforcement Learning
GLM-V Team
Wenyi Hong
Wenmeng Yu
Xiaohan Zhang
G. Wang
...
Bin Xu
J. Li
Minlie Huang
Yuxiao Dong
Jie Tang
MLLMReLMLRMVLM
564
11
0
01 Jul 2025
InstructionBench: An Instructional Video Understanding Benchmark
InstructionBench: An Instructional Video Understanding Benchmark
Haiwan Wei
Yitian Yuan
Xiaohan Lan
Wei Ke
Lin Ma
ELM
331
3
0
01 Jul 2025
Flash-VStream: Efficient Real-Time Understanding for Long Video Streams
Flash-VStream: Efficient Real-Time Understanding for Long Video Streams
Haoji Zhang
Yiqin Wang
Yansong Tang
Yong-Jin Liu
Jiashi Feng
Xiaojie Jin
VLM
269
11
0
30 Jun 2025
Q-Frame: Query-aware Frame Selection and Multi-Resolution Adaptation for Video-LLMs
Q-Frame: Query-aware Frame Selection and Multi-Resolution Adaptation for Video-LLMs
Shaojie Zhang
Jiahui Yang
Jianqin Yin
Zhenbo Luo
Jian Luan
360
23
0
27 Jun 2025
ImplicitQA: Going beyond frames towards Implicit Video Reasoning
ImplicitQA: Going beyond frames towards Implicit Video Reasoning
S. Swetha
Rohit Gupta
P. Kulkarni
David G. Shatwell
Jeffrey A. Chan-Santiago
Nyle Siddiqui
Joseph Fioresi
Mubarak Shah
137
3
0
26 Jun 2025
PEVLM: Parallel Encoding for Vision-Language Models
PEVLM: Parallel Encoding for Vision-Language Models
Letian Kang
Shixian Luo
Yiqiang Li
Yuxin Yin
Shenxuan Zhou
Xiaoyang Yu
Jin Yang
Yong Wu
MLLMVLM
245
0
0
24 Jun 2025
LaVi: Efficient Large Vision-Language Models via Internal Feature Modulation
LaVi: Efficient Large Vision-Language Models via Internal Feature Modulation
Tongtian Yue
Longteng Guo
Yepeng Tang
Zijia Zhao
Xinxin Zhu
Hua Huang
Jing Liu
MLLMVLM
173
2
0
20 Jun 2025
GRPO-CARE: Consistency-Aware Reinforcement Learning for Multimodal Reasoning
GRPO-CARE: Consistency-Aware Reinforcement Learning for Multimodal Reasoning
Yi Chen
Yuying Ge
Rui Wang
Yixiao Ge
Junhao Cheng
Mingyu Ding
Xihui Liu
OffRLVLMLRM
178
23
0
19 Jun 2025
FindingDory: A Benchmark to Evaluate Memory in Embodied Agents
FindingDory: A Benchmark to Evaluate Memory in Embodied Agents
Karmesh Yadav
Yusuf Ali
Gunshi Gupta
Y. Gal
Z. Kira
LM&Ro
276
2
0
18 Jun 2025
Previous
123456...91011
Next
Page 5 of 11
Pageof 11