ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2405.21075
  4. Cited By
Video-MME: The First-Ever Comprehensive Evaluation Benchmark of Multi-modal LLMs in Video Analysis
v1v2v3 (latest)

Video-MME: The First-Ever Comprehensive Evaluation Benchmark of Multi-modal LLMs in Video Analysis

31 May 2024
Chaoyou Fu
Yuhan Dai
Yondong Luo
Lei Li
Shuhuai Ren
Renrui Zhang
Zihan Wang
Chenyu Zhou
Chunjiang Ge
Mengdan Zhang
Peixian Chen
Yanwei Li
Shaohui Lin
Zhengye Zhang
Ke Li
Tong Xu
Xiawu Zheng
Enhong Chen
Caifeng Shan
Xing Sun
Xing Sun
    VLMMLLM
ArXiv (abs)PDFHTMLHuggingFace (25 upvotes)

Papers citing "Video-MME: The First-Ever Comprehensive Evaluation Benchmark of Multi-modal LLMs in Video Analysis"

50 / 550 papers shown
K-frames: Scene-Driven Any-k Keyframe Selection for long video understanding
K-frames: Scene-Driven Any-k Keyframe Selection for long video understanding
Yifeng Yao
Yike Yun
Jing Wang
Huishuai Zhang
Dongyan Zhao
Ke Tian
Zhihao Wang
Minghui Qiu
Tao Wang
CLIPVGen
136
1
0
14 Oct 2025
VideoLucy: Deep Memory Backtracking for Long Video Understanding
VideoLucy: Deep Memory Backtracking for Long Video Understanding
Jialong Zuo
Yongtai Deng
Lingdong Kong
J. Yang
Rui Jin
Y. Zhang
Nong Sang
Liang Pan
Ziwei Liu
Changxin Gao
141
2
0
14 Oct 2025
Scaling Language-Centric Omnimodal Representation Learning
Scaling Language-Centric Omnimodal Representation Learning
Chenghao Xiao
Hou Pong Chan
Hao Zhang
Weiwen Xu
Mahani Aljunied
Yu Rong
139
0
0
13 Oct 2025
ExpVid: A Benchmark for Experiment Video Understanding & Reasoning
ExpVid: A Benchmark for Experiment Video Understanding & Reasoning
Yicheng Xu
Y. Wu
Jiashuo Yu
Ziang Yan
Tianxiang Jiang
...
Kai Chen
Yu Qiao
Limin Wang
Manabu Okumura
Y. Wang
LRM
140
1
0
13 Oct 2025
Video-STR: Reinforcing MLLMs in Video Spatio-Temporal Reasoning with Relation Graph
Video-STR: Reinforcing MLLMs in Video Spatio-Temporal Reasoning with Relation Graph
Wentao Wang
Heqing Zou
Tianze Luo
Rui Huang
Yutian Zhao
...
Hansheng Zhang
C. Qin
Yan Wang
Tianyuan Chen
Huaijian Zhang
AI4TS
296
0
0
13 Oct 2025
video-SALMONN S: Streaming Audio-Visual LLMs Beyond Length Limits via Memory
video-SALMONN S: Streaming Audio-Visual LLMs Beyond Length Limits via Memory
Guangzhi Sun
Yixuan Li
Xiaodong Wu
Yudong Yang
Wei Li
Zejun Ma
Chao Zhang
91
1
0
13 Oct 2025
Answer-Consistent Chain-of-thought Reinforcement Learning For Multi-modal Large Langauge Models
Answer-Consistent Chain-of-thought Reinforcement Learning For Multi-modal Large Langauge Models
Minbin Huang
Runhui Huang
Chuanyang Zheng
Jingyao Li
Guoxuan Chen
Han Shi
Hong Cheng
KELMLRM
125
0
0
11 Oct 2025
ChoirRec: Semantic User Grouping via LLMs for Conversion Rate Prediction of Low-Activity Users
ChoirRec: Semantic User Grouping via LLMs for Conversion Rate Prediction of Low-Activity Users
Dakai Zhai
Jiong Gao
Boya Du
Junwei Xu
Qijie Shen
J. Zhu
Yuning Jiang
138
8
0
10 Oct 2025
MomentSeg: Moment-Centric Sampling for Enhanced Video Pixel Understanding
MomentSeg: Moment-Centric Sampling for Enhanced Video Pixel Understanding
Ming Dai
Sen Yang
Boqiang Duan
Wankou Yang
Jingdong Wang
VOS
281
0
0
10 Oct 2025
SciVideoBench: Benchmarking Scientific Video Reasoning in Large Multimodal Models
SciVideoBench: Benchmarking Scientific Video Reasoning in Large Multimodal Models
Andong Deng
Taojiannan Yang
S. Yu
Lincoln Spencer
Mohit Bansal
Chen Chen
Serena Yeung-Levy
Xiaohan Wang
LRM
135
3
0
09 Oct 2025
VideoNorms: Benchmarking Cultural Awareness of Video Language Models
VideoNorms: Benchmarking Cultural Awareness of Video Language Models
Nikhil Reddy Varimalla
Yunfei Xu
Arkadiy Saakyan
Meng Fan Wang
Smaranda Muresan
VGenVLM
193
0
0
09 Oct 2025
MARC: Memory-Augmented RL Token Compression for Efficient Video Understanding
MARC: Memory-Augmented RL Token Compression for Efficient Video Understanding
Peiran Wu
Zhuorui Yu
Yunze Liu
Chi-Hao Wu
Enmin Zhou
Junxiao Shen
OffRLVLM
95
1
0
09 Oct 2025
Improving Temporal Understanding Logic Consistency in Video-Language Models via Attention Enhancement
Improving Temporal Understanding Logic Consistency in Video-Language Models via Attention Enhancement
Chengzhi Li
Heyan Huang
Ping Jian
Zhen Yang
Yaning Tian
98
0
0
09 Oct 2025
Flow4Agent: Long-form Video Understanding via Motion Prior from Optical Flow
Flow4Agent: Long-form Video Understanding via Motion Prior from Optical Flow
Ruyang Liu
Shangkun Sun
Haoran Tang
Ge Li
Wei-Nan Gao
VGenVLM
96
3
0
07 Oct 2025
LogSTOP: Temporal Scores over Prediction Sequences for Matching and Retrieval
LogSTOP: Temporal Scores over Prediction Sequences for Matching and Retrieval
Avishree Khare
Hideki Okamoto
Bardh Hoxha
Georgios Fainekos
Rajeev Alur
132
0
0
07 Oct 2025
From Learning to Mastery: Achieving Safe and Efficient Real-World Autonomous Driving with Human-In-The-Loop Reinforcement Learning
From Learning to Mastery: Achieving Safe and Efficient Real-World Autonomous Driving with Human-In-The-Loop Reinforcement Learning
Li Zeqiao
Wang Yijing
Wang Haoyu
Li Zheng
Li Peng
Liu Wenfei
Zuo zhiqiang
160
0
0
07 Oct 2025
When Thinking Drifts: Evidential Grounding for Robust Video Reasoning
When Thinking Drifts: Evidential Grounding for Robust Video Reasoning
M. Luo
Zihui Xue
Alex Dimakis
Kristen Grauman
VGenLRM
268
4
0
07 Oct 2025
A.I.R.: Enabling Adaptive, Iterative, and Reasoning-based Frame Selection For Video Question Answering
A.I.R.: Enabling Adaptive, Iterative, and Reasoning-based Frame Selection For Video Question Answering
Yuanhao Zou
Shengji Jin
Andong Deng
Youpeng Zhao
Jun Wang
Chen Chen
109
0
0
06 Oct 2025
Video-LMM Post-Training: A Deep Dive into Video Reasoning with Large Multimodal Models
Video-LMM Post-Training: A Deep Dive into Video Reasoning with Large Multimodal Models
Yunlong Tang
Jing Bi
Pinxin Liu
Zhenyu Pan
Mingqian Feng
...
Zeliang Zhang
Daiki Shimada
Han Liu
Jiebo Luo
Chenliang Xu
MLLMOffRLVLMLRM
742
8
0
06 Oct 2025
Video-in-the-Loop: Span-Grounded Long Video QA with Interleaved Reasoning
Video-in-the-Loop: Span-Grounded Long Video QA with Interleaved Reasoning
C. Wang
Donglin Bai
Yifan Yang
Xiao Jin
Anlan Zhang
...
Jingdong Sun
Chong Luo
Ting Cao
Lili Qiu
Suman Banerjee
258
1
0
05 Oct 2025
The Artificial Intelligence Cognitive Examination: A Survey on the Evolution of Multimodal Evaluation from Recognition to Reasoning
The Artificial Intelligence Cognitive Examination: A Survey on the Evolution of Multimodal Evaluation from Recognition to Reasoning
Mayank Ravishankara
Varindra V. Persad Maharaj
ELM
202
1
0
05 Oct 2025
FrameOracle: Learning What to See and How Much to See in Videos
FrameOracle: Learning What to See and How Much to See in Videos
Chaoyu Li
Tianzhi Li
Fei Tao
Zhenyu Zhao
Ziqian Wu
Maozheng Zhao
Juntong Song
Cheng Niu
Pooyan Fazli
VLM
125
0
0
04 Oct 2025
Harnessing Synthetic Preference Data for Enhancing Temporal Understanding of Video-LLMs
Harnessing Synthetic Preference Data for Enhancing Temporal Understanding of Video-LLMs
Sameep Vani
Shreyas Jena
Maitreya Patel
Chitta Baral
Somak Aditya
Yezhou Yang
AI4TSSyDa
148
0
0
04 Oct 2025
From Frames to Clips: Training-free Adaptive Key Clip Selection for Long-Form Video Understanding
From Frames to Clips: Training-free Adaptive Key Clip Selection for Long-Form Video Understanding
Guangyu Sun
Archit Singhal
Burak Uzkent
Mubarak Shah
Chen Chen
Garin Kessler
CLIPVLM
153
0
0
02 Oct 2025
Oracle-RLAIF: An Improved Fine-Tuning Framework for Multi-modal Video Models through Reinforcement Learning from Ranking Feedback
Oracle-RLAIF: An Improved Fine-Tuning Framework for Multi-modal Video Models through Reinforcement Learning from Ranking Feedback
Derek Shi
Ruben Glatt
Christine Klymko
Shubham Mohole
Hongjun Choi
Shashank Kushwaha
Sam Sakla
Felipe Leno Da Silva
AI4TSVLM
179
0
0
02 Oct 2025
Training-free Uncertainty Guidance for Complex Visual Tasks with MLLMs
Training-free Uncertainty Guidance for Complex Visual Tasks with MLLMs
Sanghwan Kim
Rui Xiao
Stephan Alaniz
Yongqin Xian
Zeynep Akata
127
0
0
01 Oct 2025
TimeScope: Towards Task-Oriented Temporal Grounding In Long Videos
TimeScope: Towards Task-Oriented Temporal Grounding In Long Videos
Xiangrui Liu
Minghao Qin
Yan Shu
Zhengyang Liang
Yang Tian
Chen Jason Zhang
Bo Zhao
Zheng Liu
319
0
0
30 Sep 2025
TAMA: Tool-Augmented Multimodal Agent for Procedural Activity Understanding
TAMA: Tool-Augmented Multimodal Agent for Procedural Activity Understanding
Kimihiro Hasegawa
Wiradee Imrattanatrai
Masaki Asada
Ken Fukuda
Teruko Mitamura
148
0
0
30 Sep 2025
Human-MME: A Holistic Evaluation Benchmark for Human-Centric Multimodal Large Language Models
Human-MME: A Holistic Evaluation Benchmark for Human-Centric Multimodal Large Language Models
Yuansen Liu
Haiming Tang
Jinlong Peng
Jiangning Zhang
Xiaozhong Ji
...
Chaoyou Fu
Chengjie Wang
Chengjie Wang
Xiaobin Hu
Shuicheng Yan
VLM
241
1
0
30 Sep 2025
V-HUB: A Visual-Centric Humor Understanding Benchmark for Video LLMs
V-HUB: A Visual-Centric Humor Understanding Benchmark for Video LLMs
Zhengpeng Shi
Hengli Li
Yanpeng Zhao
Jianqun Zhou
Yuxuan Wang
Qinrong Cui
Wei Bi
Songchun Zhu
Bo Zhao
Zilong Zheng
VLM
119
0
0
30 Sep 2025
AccidentBench: Benchmarking Multimodal Understanding and Reasoning in Vehicle Accidents and Beyond
AccidentBench: Benchmarking Multimodal Understanding and Reasoning in Vehicle Accidents and Beyond
Shangding Gu
Xiaohan Wang
Donghao Ying
Haoyu Zhao
Runing Yang
...
Marco Pavone
Serena Yeung-Levy
Jun Wang
Dawn Song
C. Spanos
117
0
0
30 Sep 2025
NeMo: Needle in a Montage for Video-Language Understanding
NeMo: Needle in a Montage for Video-Language Understanding
Zi-Yuan Hu
Shuo Liang
Duo Zheng
Yanyang Li
Yeyao Tao
...
Jianguang Yu
Jing-ling Huang
Meng Fang
Yin Li
Liwei Wang
170
2
0
29 Sep 2025
StreamForest: Efficient Online Video Understanding with Persistent Event Memory
StreamForest: Efficient Online Video Understanding with Persistent Event Memory
Xiangyu Zeng
Kefan Qiu
Qingyu Zhang
Xinhao Li
Jing Wang
...
Kun Tian
Meng Tian
Xinhai Zhao
Yi Wang
Limin Wang
231
2
0
29 Sep 2025
VideoAnchor: Reinforcing Subspace-Structured Visual Cues for Coherent Visual-Spatial Reasoning
VideoAnchor: Reinforcing Subspace-Structured Visual Cues for Coherent Visual-Spatial Reasoning
Zhaozhi Wang
Tong Zhang
Mingyue Guo
Yaowei Wang
QiXiang Ye
135
1
0
29 Sep 2025
When MLLMs Meet Compression Distortion: A Coding Paradigm Tailored to MLLMs
When MLLMs Meet Compression Distortion: A Coding Paradigm Tailored to MLLMs
Jinming Liu
Zhaoyang Jia
J. Li
Bin Li
Xin Jin
Wenjun Zeng
Yan Lu
82
0
0
29 Sep 2025
LOVE-R1: Advancing Long Video Understanding with an Adaptive Zoom-in Mechanism via Multi-Step Reasoning
LOVE-R1: Advancing Long Video Understanding with an Adaptive Zoom-in Mechanism via Multi-Step Reasoning
Shenghao Fu
Q. Yang
Yuan-Ming Li
Xihan Wei
Xiaohua Xie
Wei-Shi Zheng
LRM
164
7
0
29 Sep 2025
From Perception to Cognition: A Survey of Vision-Language Interactive Reasoning in Multimodal Large Language Models
From Perception to Cognition: A Survey of Vision-Language Interactive Reasoning in Multimodal Large Language Models
Chenyue Zhou
Mingxuan Wang
Yanbiao Ma
Chenxu Wu
Wanyi Chen
...
Guoli Jia
Lingling Li
Z. Lu
Y. Lu
Wenhan Luo
LRM
448
9
0
29 Sep 2025
FrameThinker: Learning to Think with Long Videos via Multi-Turn Frame Spotlighting
FrameThinker: Learning to Think with Long Videos via Multi-Turn Frame Spotlighting
Zefeng He
Xiaoye Qu
Yafu Li
Siyuan Huang
Daizong Liu
Yu Cheng
OffRLVLMLRM
293
7
0
29 Sep 2025
IWR-Bench: Can LVLMs reconstruct interactive webpage from a user interaction video?
IWR-Bench: Can LVLMs reconstruct interactive webpage from a user interaction video?
Yang Chen
Minghao Liu
Yufan Shen
Y. Li
Tianyuan Huang
...
Zhi Yu
Yongliang Shen
Yu Qiao
Yu Qiao
Ding Wang
VGenVLM
257
0
0
29 Sep 2025
Perceive, Reflect and Understand Long Video: Progressive Multi-Granular Clue Exploration with Interactive Agents
Perceive, Reflect and Understand Long Video: Progressive Multi-Granular Clue Exploration with Interactive Agents
J. Li
Kun-Juan Wei
Zhe Xu
Zibo Su
Xu Yang
Cheng Deng
142
0
0
29 Sep 2025
ReWatch-R1: Boosting Complex Video Reasoning in Large Vision-Language Models through Agentic Data Synthesis
ReWatch-R1: Boosting Complex Video Reasoning in Large Vision-Language Models through Agentic Data Synthesis
Congzhi Zhang
Zhibin Wang
Yinchao Ma
Jiawei Peng
Y. Wang
Qiang Zhou
Jun Song
Bo Zheng
OffRLAI4TSLRM
230
2
0
28 Sep 2025
Video Panels for Long Video Understanding
Video Panels for Long Video Understanding
Lars Doorenbos
Federico Spurio
Juergen Gall
VLM
119
0
0
28 Sep 2025
FrameMind: Frame-Interleaved Video Reasoning via Reinforcement Learning
FrameMind: Frame-Interleaved Video Reasoning via Reinforcement Learning
Haonan Ge
Yiwei Wang
Kai-Wei Chang
Hang Wu
Yujun Cai
LRM
249
0
0
28 Sep 2025
Compose and Fuse: Revisiting the Foundational Bottlenecks in Multimodal Reasoning
Compose and Fuse: Revisiting the Foundational Bottlenecks in Multimodal Reasoning
Yucheng Wang
Yifan Hou
Aydin Javadov
Mubashara Akhtar
Mrinmaya Sachan
LRM
129
0
0
28 Sep 2025
Evaluating point-light biological motion in multimodal large language models
Evaluating point-light biological motion in multimodal large language models
Akila Kadambi
Marco Iacoboni
Lisa Aziz-Zadeh
Srini Narayanan
122
1
0
27 Sep 2025
SPIKE-RL: Video-LLMs meet Bayesian Surprise
SPIKE-RL: Video-LLMs meet Bayesian Surprise
Sahithya Ravi
Aditya Chinchure
R. Ng
Leonid Sigal
Vered Shwartz
OffRLCML
100
0
0
27 Sep 2025
WAVE: Learning Unified & Versatile Audio-Visual Embeddings with Multimodal LLM
WAVE: Learning Unified & Versatile Audio-Visual Embeddings with Multimodal LLM
Changli Tang
Qinfan Xiao
Ke Mei
Tianyi Wang
Fengyun Rao
Chao Zhang
117
0
0
26 Sep 2025
VideoScore2: Think before You Score in Generative Video Evaluation
VideoScore2: Think before You Score in Generative Video Evaluation
Xuan He
Dongfu Jiang
Ping Nie
Minghao Liu
Z. L. Jiang
...
Qunshu Lin
Yuanxing Zhang
Ge Zhang
Wenhao Huang
Wenhu Chen
EGVMVGenLRM
1.2K
5
0
26 Sep 2025
Lightweight Structured Multimodal Reasoning for Clinical Scene Understanding in Robotics
Lightweight Structured Multimodal Reasoning for Clinical Scene Understanding in Robotics
Saurav Jha
Stefan K. Ehrlich
LM&Ro
82
0
0
26 Sep 2025
VideoChat-R1.5: Visual Test-Time Scaling to Reinforce Multimodal Reasoning by Iterative Perception
VideoChat-R1.5: Visual Test-Time Scaling to Reinforce Multimodal Reasoning by Iterative Perception
Ziang Yan
Xinhao Li
Yinan He
Zhengrong Yue
Xiangyu Zeng
Yali Wang
Yu Qiao
Limin Wang
Yi Wang
MLLMVLMLRM
213
13
0
25 Sep 2025
Previous
123456...91011
Next