Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales

Terms and Conditions

Twitter GitHub LinkedIn Bluesky Youtube

© 2026 ResearchTrend.AI, All rights reserved.

Home
Papers
2406.08035
Cited By

LVBench: An Extreme Long Video Understanding Benchmark

v1v2v3 (latest)

LVBench: An Extreme Long Video Understanding Benchmark

12 June 2024

ArXiv (abs)PDF HTML HuggingFace (1 upvotes)Github

Papers citing "LVBench: An Extreme Long Video Understanding Benchmark"

50 / 146 papers shown

WorldMM: Dynamic Multimodal Memory Agent for Long Video Reasoning

WorldMM: Dynamic Multimodal Memory Agent for Long Video Reasoning

470

6

0

30 Mar 2026

Active Video Perception: Iterative Evidence Seeking for Agentic Long Video Understanding

Active Video Perception: Iterative Evidence Seeking for Agentic Long Video Understanding

Silvio Savarese

Juan Carlos Niebles

81

4

0

05 Dec 2025

ViDiC: Video Difference Captioning

ViDiC: Video Difference Captioning

275

0

0

03 Dec 2025

EEA: Exploration-Exploitation Agent for Long Video Understanding

EEA: Exploration-Exploitation Agent for Long Video Understanding

109

0

0

03 Dec 2025

TUNA: Taming Unified Visual Representations for Native Unified Multimodal Models

TUNA: Taming Unified Visual Representations for Native Unified Multimodal Models

...

243

14

0

01 Dec 2025

ViRectify: A Challenging Benchmark for Video Reasoning Correction with Multimodal Large Language Models

ViRectify: A Challenging Benchmark for Video Reasoning Correction with Multimodal Large Language Models

291

0

0

01 Dec 2025

HanDyVQA: A Video QA Benchmark for Fine-Grained Hand-Object Interaction Dynamics

HanDyVQA: A Video QA Benchmark for Fine-Grained Hand-Object Interaction Dynamics

Masatoshi Tateno

Hirokatsu Kataoka

165

1

0

30 Nov 2025

Qwen3-VL Technical Report

Qwen3-VL Technical Report

...

F. I. S. Kevin Zhou

2.3K

570

0

26 Nov 2025

LongVT: Incentivizing "Thinking with Long Videos" via Native Tool Calling

LongVT: Incentivizing "Thinking with Long Videos" via Native Tool Calling

...

257

23

0

25 Nov 2025

Unboxing the Black Box: Mechanistic Interpretability for Algorithmic Understanding of Neural Networks

Unboxing the Black Box: Mechanistic Interpretability for Algorithmic Understanding of Neural Networks

Bianka Kowalska

Halina Kwaśnicka

264

0

0

24 Nov 2025

Vidi2.5: Large Multimodal Models for Video Understanding and Creation

Vidi2.5: Large Multimodal Models for Video Understanding and Creation

...

Yicheng He

Yiming Cui

Zhenfang Chen

Zhihua Wu

Zuhua Lin

117

0

0

24 Nov 2025

LAST: LeArning to Think in Space and Time for Generalist Vision-Language Models

LAST: LeArning to Think in Space and Time for Generalist Vision-Language Models

213

1

0

24 Nov 2025

EgoVITA: Learning to Plan and Verify for Egocentric Video Reasoning

EgoVITA: Learning to Plan and Verify for Egocentric Video Reasoning

Yogesh Kulkarni

462

5

0

23 Nov 2025

EventBench: Towards Comprehensive Benchmarking of Event-based MLLMs

EventBench: Towards Comprehensive Benchmarking of Event-based MLLMs

99

1

0

23 Nov 2025

TimeViper: A Hybrid Mamba-Transformer Vision-Language Model for Efficient Long Video Understanding

TimeViper: A Hybrid Mamba-Transformer Vision-Language Model for Efficient Long Video Understanding

615

2

0

20 Nov 2025

FoleyBench: A Benchmark For Video-to-Audio Models

FoleyBench: A Benchmark For Video-to-Audio Models

615

2

0

17 Nov 2025

REVISOR: Beyond Textual Reflection, Towards Multimodal Introspective Reasoning in Long-Form Video Understanding

REVISOR: Beyond Textual Reflection, Towards Multimodal Introspective Reasoning in Long-Form Video Understanding

302

7

0

17 Nov 2025

Seeing the Forest and the Trees: Query-Aware Tokenizer for Long-Video Multimodal Language Models

Seeing the Forest and the Trees: Query-Aware Tokenizer for Long-Video Multimodal Language Models

...

217

1

0

14 Nov 2025

LiveStar: Live Streaming Assistant for Real-World Online Video Understanding

LiveStar: Live Streaming Assistant for Real-World Online Video Understanding

Shengsheng Qian

OffRL AI4TS VLM

322

6

0

07 Nov 2025

Revisiting Multimodal Positional Encoding in Vision-Language Models

Revisiting Multimodal Positional Encoding in Vision-Language Models

222

12

0

27 Oct 2025

A Video Is Not Worth a Thousand Words

A Video Is Not Worth a Thousand Words

144

0

0

27 Oct 2025

Video-Thinker: Sparking "Thinking with Videos" via Reinforcement Learning

Video-Thinker: Sparking "Thinking with Videos" via Reinforcement Learning

199

14

0

27 Oct 2025

Conan: Progressive Learning to Reason Like a Detective over Multi-Scale Visual Evidence

Conan: Progressive Learning to Reason Like a Detective over Multi-Scale Visual Evidence

495

7

0

23 Oct 2025

SeViCES: Unifying Semantic-Visual Evidence Consensus for Long Video Understanding

SeViCES: Unifying Semantic-Visual Evidence Consensus for Long Video Understanding

132

1

0

23 Oct 2025

MT-Video-Bench: A Holistic Video Understanding Benchmark for Evaluating Multimodal LLMs in Multi-Turn Dialogues

MT-Video-Bench: A Holistic Video Understanding Benchmark for Evaluating Multimodal LLMs in Multi-Turn Dialogues

...

Tianhao Peng

Jiaheng Liu

236

4

0

20 Oct 2025

Recurrent Attention-based Token Selection for Efficient Streaming Video-LLMs

Recurrent Attention-based Token Selection for Efficient Streaming Video-LLMs

Vaggelis Dorovatas

153

3

0

20 Oct 2025

Select Less, Reason More: Prioritizing Evidence Purity for Video Reasoning

Select Less, Reason More: Prioritizing Evidence Purity for Video Reasoning

145

4

0

17 Oct 2025

VideoLucy: Deep Memory Backtracking for Long Video Understanding

VideoLucy: Deep Memory Backtracking for Long Video Understanding

193

7

0

14 Oct 2025

video-SALMONN S: Memory-Enhanced Streaming Audio-Visual LLM

video-SALMONN S: Memory-Enhanced Streaming Audio-Visual LLM

127

1

0

13 Oct 2025

A Survey on Agentic Multimodal Large Language Models

A Survey on Agentic Multimodal Large Language Models

...

LM&Ro AIFin AI4TS LRM AI4CE

303

12

0

13 Oct 2025

ExpVid: A Benchmark for Experiment Video Understanding & Reasoning

ExpVid: A Benchmark for Experiment Video Understanding & Reasoning

Tianxiang Jiang

...

187

1

0

13 Oct 2025

ChoirRec: Semantic User Grouping via LLMs for Conversion Rate Prediction of Low-Activity Users

ChoirRec: Semantic User Grouping via LLMs for Conversion Rate Prediction of Low-Activity Users

179

31

0

10 Oct 2025

Harnessing Synthetic Preference Data for Enhancing Temporal Understanding of Video-LLMs

Harnessing Synthetic Preference Data for Enhancing Temporal Understanding of Video-LLMs

184

0

0

04 Oct 2025

v-HUB: A Benchmark for Video Humor Understanding from Vision and Sound

v-HUB: A Benchmark for Video Humor Understanding from Vision and Sound

205

0

0

30 Sep 2025

AccidentBench: Benchmarking Multimodal Understanding and Reasoning in Vehicle Accidents and Beyond

AccidentBench: Benchmarking Multimodal Understanding and Reasoning in Vehicle Accidents and Beyond

...

Serena Yeung-Levy

149

2

0

30 Sep 2025

NeMo: Needle in a Montage for Video-Language Understanding

NeMo: Needle in a Montage for Video-Language Understanding

...

Jing-ling Huang

216

2

0

29 Sep 2025

FrameThinker: Learning to Think with Long Videos via Multi-Turn Frame Spotlighting

FrameThinker: Learning to Think with Long Videos via Multi-Turn Frame Spotlighting

363

16

0

29 Sep 2025

LOVE-R1: Advancing Long Video Understanding with an Adaptive Zoom-in Mechanism via Multi-Step Reasoning

LOVE-R1: Advancing Long Video Understanding with an Adaptive Zoom-in Mechanism via Multi-Step Reasoning

201

13

0

29 Sep 2025

FreeRet: MLLMs as Training-Free Retrievers

FreeRet: MLLMs as Training-Free Retrievers

241

4

0

29 Sep 2025

ReWatch-R1: Boosting Complex Video Reasoning in Large Vision-Language Models through Agentic Data Synthesis

ReWatch-R1: Boosting Complex Video Reasoning in Large Vision-Language Models through Agentic Data Synthesis

OffRL AI4TS LRM

282

9

0

28 Sep 2025

Evaluating point-light biological motion in multimodal large language models

Evaluating point-light biological motion in multimodal large language models

Lisa Aziz-Zadeh

Srini Narayanan

156

1

0

27 Sep 2025

VideoChat-R1.5: Visual Test-Time Scaling to Reinforce Multimodal Reasoning by Iterative Perception

VideoChat-R1.5: Visual Test-Time Scaling to Reinforce Multimodal Reasoning by Iterative Perception

253

26

0

25 Sep 2025

ConViS-Bench: Estimating Video Similarity Through Semantic Concepts

ConViS-Bench: Estimating Video Similarity Through Semantic Concepts

Benedetta Liberatori

Alessandro Conti

Lorenzo Vaquero

196

1

0

23 Sep 2025

Do Modern Video-LLMs Need to Listen? A Benchmark Audit and Scalable Remedy

Do Modern Video-LLMs Need to Listen? A Benchmark Audit and Scalable Remedy

163

0

0

22 Sep 2025

NeuS-QA: Grounding Long-Form Video Understanding in Temporal Logic and Neuro-Symbolic Reasoning

NeuS-QA: Grounding Long-Form Video Understanding in Temporal Logic and Neuro-Symbolic Reasoning

Sandeep Chinchali

220

4

0

22 Sep 2025

VideoPro: Adaptive Program Reasoning for Long Video Understanding

VideoPro: Adaptive Program Reasoning for Long Video Understanding

...

Feng Tao

Jingqi Tong

Yin Zhang

Jiaqi Wang

235

0

0

22 Sep 2025

Qwen3-Omni Technical Report

Qwen3-Omni Technical Report

...

270

191

0

22 Sep 2025

ChronoForge-RL: Chronological Forging through Reinforcement Learning for Enhanced Video Understanding

ChronoForge-RL: Chronological Forging through Reinforcement Learning for Enhanced Video Understanding

152

1

0

19 Sep 2025

Cinéaste: A Fine-grained Contextual Movie Question Answering Benchmark

Cinéaste: A Fine-grained Contextual Movie Question Answering Benchmark

Chaitanya Ekanadham

Vishal M. Patel

181

0

0

17 Sep 2025

AToken: A Unified Tokenizer for Vision

AToken: A Unified Tokenizer for Vision

344

14

0

17 Sep 2025

Page 1 of 3