v1v2v3 (latest)

Video-MME: The First-Ever Comprehensive Evaluation Benchmark of Multi-modal LLMs in Video Analysis

31 May 2024

ArXiv (abs)PDF HTML HuggingFace (25 upvotes)

Papers citing "Video-MME: The First-Ever Comprehensive Evaluation Benchmark of Multi-modal LLMs in Video Analysis"

50 / 550 papers shown

K-frames: Scene-Driven Any-k Keyframe Selection for long video understanding

136

14 Oct 2025

VideoLucy: Deep Memory Backtracking for Long Video Understanding

141

14 Oct 2025

Scaling Language-Centric Omnimodal Representation Learning

139

13 Oct 2025

ExpVid: A Benchmark for Experiment Video Understanding & Reasoning

...

140

13 Oct 2025

Video-STR: Reinforcing MLLMs in Video Spatio-Temporal Reasoning with Relation Graph

...

296

13 Oct 2025

video-SALMONN S: Streaming Audio-Visual LLMs Beyond Length Limits via Memory

13 Oct 2025

Answer-Consistent Chain-of-thought Reinforcement Learning For Multi-modal Large Langauge Models

125

11 Oct 2025

ChoirRec: Semantic User Grouping via LLMs for Conversion Rate Prediction of Low-Activity Users

138

10 Oct 2025

MomentSeg: Moment-Centric Sampling for Enhanced Video Pixel Understanding

281

10 Oct 2025

SciVideoBench: Benchmarking Scientific Video Reasoning in Large Multimodal Models

135

09 Oct 2025

VideoNorms: Benchmarking Cultural Awareness of Video Language Models

Nikhil Reddy Varimalla

193

09 Oct 2025

MARC: Memory-Augmented RL Token Compression for Efficient Video Understanding

09 Oct 2025

Improving Temporal Understanding Logic Consistency in Video-Language Models via Attention Enhancement

09 Oct 2025

Flow4Agent: Long-form Video Understanding via Motion Prior from Optical Flow

07 Oct 2025

LogSTOP: Temporal Scores over Prediction Sequences for Matching and Retrieval

132

07 Oct 2025

From Learning to Mastery: Achieving Safe and Efficient Real-World Autonomous Driving with Human-In-The-Loop Reinforcement Learning

160

07 Oct 2025

When Thinking Drifts: Evidential Grounding for Robust Video Reasoning

268

07 Oct 2025

A.I.R.: Enabling Adaptive, Iterative, and Reasoning-based Frame Selection For Video Question Answering

109

06 Oct 2025

Video-LMM Post-Training: A Deep Dive into Video Reasoning with Large Multimodal Models

...

742

06 Oct 2025

Video-in-the-Loop: Span-Grounded Long Video QA with Interleaved Reasoning

...

258

05 Oct 2025

The Artificial Intelligence Cognitive Examination: A Survey on the Evolution of Multimodal Evaluation from Recognition to Reasoning

Mayank Ravishankara

Varindra V. Persad Maharaj

ELM

202

05 Oct 2025

FrameOracle: Learning What to See and How Much to See in Videos

125

04 Oct 2025

Harnessing Synthetic Preference Data for Enhancing Temporal Understanding of Video-LLMs

148

04 Oct 2025

From Frames to Clips: Training-free Adaptive Key Clip Selection for Long-Form Video Understanding

153

02 Oct 2025

Oracle-RLAIF: An Improved Fine-Tuning Framework for Multi-modal Video Models through Reinforcement Learning from Ranking Feedback

179

02 Oct 2025

Training-free Uncertainty Guidance for Complex Visual Tasks with MLLMs

127

01 Oct 2025

TimeScope: Towards Task-Oriented Temporal Grounding In Long Videos

319

30 Sep 2025

TAMA: Tool-Augmented Multimodal Agent for Procedural Activity Understanding

Kimihiro Hasegawa

Wiradee Imrattanatrai

Masaki Asada

Ken Fukuda

Teruko Mitamura

148

30 Sep 2025

Human-MME: A Holistic Evaluation Benchmark for Human-Centric Multimodal Large Language Models

...

241

30 Sep 2025

V-HUB: A Visual-Centric Humor Understanding Benchmark for Video LLMs

119

30 Sep 2025

AccidentBench: Benchmarking Multimodal Understanding and Reasoning in Vehicle Accidents and Beyond

...

117

30 Sep 2025

NeMo: Needle in a Montage for Video-Language Understanding

...

170

29 Sep 2025

StreamForest: Efficient Online Video Understanding with Persistent Event Memory

...

231

29 Sep 2025

VideoAnchor: Reinforcing Subspace-Structured Visual Cues for Coherent Visual-Spatial Reasoning

135

29 Sep 2025

When MLLMs Meet Compression Distortion: A Coding Paradigm Tailored to MLLMs

29 Sep 2025

LOVE-R1: Advancing Long Video Understanding with an Adaptive Zoom-in Mechanism via Multi-Step Reasoning

164

29 Sep 2025

From Perception to Cognition: A Survey of Vision-Language Interactive Reasoning in Multimodal Large Language Models

...

448

29 Sep 2025

FrameThinker: Learning to Think with Long Videos via Multi-Turn Frame Spotlighting

293

29 Sep 2025

IWR-Bench: Can LVLMs reconstruct interactive webpage from a user interaction video?

...

257

29 Sep 2025

Perceive, Reflect and Understand Long Video: Progressive Multi-Granular Clue Exploration with Interactive Agents

142

29 Sep 2025

ReWatch-R1: Boosting Complex Video Reasoning in Large Vision-Language Models through Agentic Data Synthesis

230

28 Sep 2025

Video Panels for Long Video Understanding

119

28 Sep 2025

FrameMind: Frame-Interleaved Video Reasoning via Reinforcement Learning

249

28 Sep 2025

Compose and Fuse: Revisiting the Foundational Bottlenecks in Multimodal Reasoning

129

28 Sep 2025

Evaluating point-light biological motion in multimodal large language models

122

27 Sep 2025

SPIKE-RL: Video-LLMs meet Bayesian Surprise

100

27 Sep 2025

WAVE: Learning Unified & Versatile Audio-Visual Embeddings with Multimodal LLM

117

26 Sep 2025

VideoScore2: Think before You Score in Generative Video Evaluation

...

1.2K

26 Sep 2025

Lightweight Structured Multimodal Reasoning for Clinical Scene Understanding in Robotics

Saurav Jha

Stefan K. Ehrlich

LM&Ro

26 Sep 2025

VideoChat-R1.5: Visual Test-Time Scaling to Reinforce Multimodal Reasoning by Iterative Perception

213

25 Sep 2025