ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2406.08035
  4. Cited By
LVBench: An Extreme Long Video Understanding Benchmark
v1v2v3 (latest)

LVBench: An Extreme Long Video Understanding Benchmark

12 June 2024
Weihan Wang
Zehai He
Wenyi Hong
Yean Cheng
Xiaohan Zhang
Ji Qi
Xiaohan Zhang
Shiyu Huang
Bin Xu
Yuxiao Dong
Ming Ding
Jie Tang
    ELMVLM
ArXiv (abs)PDFHTMLHuggingFace (1 upvotes)Github

Papers citing "LVBench: An Extreme Long Video Understanding Benchmark"

50 / 146 papers shown
WorldMM: Dynamic Multimodal Memory Agent for Long Video Reasoning
WorldMM: Dynamic Multimodal Memory Agent for Long Video Reasoning
Woongyeong Yeo
Kangsan Kim
Jaehong Yoon
Sung Ju Hwang
KELMVLMLRM
470
6
0
30 Mar 2026
Active Video Perception: Iterative Evidence Seeking for Agentic Long Video Understanding
Active Video Perception: Iterative Evidence Seeking for Agentic Long Video Understanding
Ziyang Wang
Honglu Zhou
Shijie Wang
Junnan Li
Caiming Xiong
Silvio Savarese
Mohit Bansal
Michael S Ryoo
Juan Carlos Niebles
81
4
0
05 Dec 2025
ViDiC: Video Difference Captioning
ViDiC: Video Difference Captioning
J. Wu
S. Li
Zhaozhou Bian
J. Chen
Runzhe Wen
An Ping
Yiwen He
Jiakai Wang
Yuanxing Zhang
Jiaheng Liu
CoGeVLM
275
0
0
03 Dec 2025
EEA: Exploration-Exploitation Agent for Long Video Understanding
EEA: Exploration-Exploitation Agent for Long Video Understanding
Te Yang
Xiangyu Zhu
Bo Wang
Quan Chen
Peng Jiang
Zhen Lei
109
0
0
03 Dec 2025
TUNA: Taming Unified Visual Representations for Native Unified Multimodal Models
TUNA: Taming Unified Visual Representations for Native Unified Multimodal Models
Zhiheng Liu
Weiming Ren
Haozhe Liu
Zijian Zhou
S. Chen
...
Ping Luo
Wei Liu
Tao Xiang
Jonas Schult
Yuren Cong
243
14
0
01 Dec 2025
ViRectify: A Challenging Benchmark for Video Reasoning Correction with Multimodal Large Language Models
ViRectify: A Challenging Benchmark for Video Reasoning Correction with Multimodal Large Language Models
Xusen Hei
Jiali Chen
Jinyu Yang
Mengchen Zhao
Yi Cai
LRM
291
0
0
01 Dec 2025
HanDyVQA: A Video QA Benchmark for Fine-Grained Hand-Object Interaction Dynamics
HanDyVQA: A Video QA Benchmark for Fine-Grained Hand-Object Interaction Dynamics
Masatoshi Tateno
Gido Kato
Hirokatsu Kataoka
Yoichi Sato
Takuma Yagi
165
1
0
30 Nov 2025
Qwen3-VL Technical Report
Qwen3-VL Technical Report
Shuai Bai
Yuxuan Cai
Ruizhe Chen
Keqin Chen
Xionghui Chen
...
Jingren Zhou
F. I. S. Kevin Zhou
J. Zhou
Yuanzhi Zhu
Ke Zhu
VLM
2.3K
570
0
26 Nov 2025
LongVT: Incentivizing "Thinking with Long Videos" via Native Tool Calling
LongVT: Incentivizing "Thinking with Long Videos" via Native Tool Calling
Zuhao Yang
Sudong Wang
Kaichen Zhang
Keming Wu
Sicong Leng
...
Bo Li
Chengwei Qin
Shijian Lu
X. Li
Lidong Bing
LRMVLM
257
23
0
25 Nov 2025
Unboxing the Black Box: Mechanistic Interpretability for Algorithmic Understanding of Neural Networks
Unboxing the Black Box: Mechanistic Interpretability for Algorithmic Understanding of Neural Networks
Bianka Kowalska
Halina Kwaśnicka
264
0
0
24 Nov 2025
Vidi2.5: Large Multimodal Models for Video Understanding and Creation
Vidi2.5: Large Multimodal Models for Video Understanding and Creation
Vidi Team
Celong Liu
Chia-Wen Kuo
Chuang Huang
Dawei Du
...
Yicheng He
Yiming Cui
Zhenfang Chen
Zhihua Wu
Zuhua Lin
117
0
0
24 Nov 2025
LAST: LeArning to Think in Space and Time for Generalist Vision-Language Models
LAST: LeArning to Think in Space and Time for Generalist Vision-Language Models
Shuai Wang
D. Zhang
Tianyi Bai
Shitong Shao
Jiebo Luo
Jiaheng Wei
VLM
213
1
0
24 Nov 2025
EgoVITA: Learning to Plan and Verify for Egocentric Video Reasoning
EgoVITA: Learning to Plan and Verify for Egocentric Video Reasoning
Yogesh Kulkarni
Pooyan Fazli
EgoVLRM
462
5
0
23 Nov 2025
EventBench: Towards Comprehensive Benchmarking of Event-based MLLMs
EventBench: Towards Comprehensive Benchmarking of Event-based MLLMs
Shaoyu Liu
Jianing Li
Guanghui Zhao
Y. Zhang
Xiangyang Ji
99
1
0
23 Nov 2025
TimeViper: A Hybrid Mamba-Transformer Vision-Language Model for Efficient Long Video Understanding
TimeViper: A Hybrid Mamba-Transformer Vision-Language Model for Efficient Long Video Understanding
Boshen Xu
Zihan Xiao
Jiaze Li
Jianzhong Ju
Zhenbo Luo
Jian Luan
Qin Jin
Mamba
615
2
0
20 Nov 2025
FoleyBench: A Benchmark For Video-to-Audio Models
FoleyBench: A Benchmark For Video-to-Audio Models
Satvik Dixit
Koichi Saito
Zhi-Wei Zhong
Yuki Mitsufuji
Chris Donahue
VGenAuLLM
615
2
0
17 Nov 2025
REVISOR: Beyond Textual Reflection, Towards Multimodal Introspective Reasoning in Long-Form Video Understanding
REVISOR: Beyond Textual Reflection, Towards Multimodal Introspective Reasoning in Long-Form Video Understanding
Jiaze Li
Hao Yin
Wenhui Tan
Jingyang Chen
Boshen Xu
Yuxun Qu
Yijing Chen
Jianzhong Ju
Zhenbo Luo
Jian Luan
LRMVLM
302
7
0
17 Nov 2025
Seeing the Forest and the Trees: Query-Aware Tokenizer for Long-Video Multimodal Language Models
Seeing the Forest and the Trees: Query-Aware Tokenizer for Long-Video Multimodal Language Models
Siyou Li
Huanan Wu
Juexi Shao
Yinghao Ma
Yujian Gan
...
Lu Wang
Wengqing Wu
Le Zhang
Massimo Poesio
Juntao Yu
VLM
217
1
0
14 Nov 2025
LiveStar: Live Streaming Assistant for Real-World Online Video Understanding
LiveStar: Live Streaming Assistant for Real-World Online Video Understanding
Zhenyu Yang
Kairui Zhang
Yuhang Hu
Bing Wang
Shengsheng Qian
Bin Wen
Fan Yang
Tingting Gao
Weiming Dong
Changsheng Xu
OffRLAI4TSVLM
322
6
0
07 Nov 2025
Revisiting Multimodal Positional Encoding in Vision-Language Models
Revisiting Multimodal Positional Encoding in Vision-Language Models
Jie Huang
Xuejing Liu
Sibo Song
Ruibing Hou
Hong Chang
Junyang Lin
S. Bai
222
12
0
27 Oct 2025
A Video Is Not Worth a Thousand Words
A Video Is Not Worth a Thousand Words
Sam Pollard
Michael Wray
144
0
0
27 Oct 2025
Video-Thinker: Sparking "Thinking with Videos" via Reinforcement Learning
Video-Thinker: Sparking "Thinking with Videos" via Reinforcement Learning
Shijian Wang
Jiarui Jin
Xingjian Wang
L. Song
Runhao Fu
H. Wang
Zongyuan Ge
Yuan Lu
Xuelian Cheng
ReLMLRM
199
14
0
27 Oct 2025
Conan: Progressive Learning to Reason Like a Detective over Multi-Scale Visual Evidence
Conan: Progressive Learning to Reason Like a Detective over Multi-Scale Visual Evidence
Kun Ouyang
Yuanxin Liu
Linli Yao
Yishuo Cai
Hao Zhou
Jie Zhou
Fandong Meng
Xu Sun
OffRLLRMReLM
495
7
0
23 Oct 2025
SeViCES: Unifying Semantic-Visual Evidence Consensus for Long Video Understanding
SeViCES: Unifying Semantic-Visual Evidence Consensus for Long Video Understanding
Yuan Sheng
Y. Hao
Chenxu Li
Shuo Wang
Xiangnan He
132
1
0
23 Oct 2025
MT-Video-Bench: A Holistic Video Understanding Benchmark for Evaluating Multimodal LLMs in Multi-Turn Dialogues
MT-Video-Bench: A Holistic Video Understanding Benchmark for Evaluating Multimodal LLMs in Multi-Turn Dialogues
Yaning Pan
Z. Wang
Qianqian Xie
Yongqian Wen
Y. Zhang
...
An Ping
Tianhao Peng
Jiaheng Liu
Tianhao Peng
Jiaheng Liu
236
4
0
20 Oct 2025
Recurrent Attention-based Token Selection for Efficient Streaming Video-LLMs
Recurrent Attention-based Token Selection for Efficient Streaming Video-LLMs
Vaggelis Dorovatas
Soroush Seifi
Gunshi Gupta
Rahaf Aljundi
153
3
0
20 Oct 2025
Select Less, Reason More: Prioritizing Evidence Purity for Video Reasoning
Select Less, Reason More: Prioritizing Evidence Purity for Video Reasoning
Xuchen Li
Xuzhao Li
Shiyu Hu
Kaiqi Huang
145
4
0
17 Oct 2025
VideoLucy: Deep Memory Backtracking for Long Video Understanding
VideoLucy: Deep Memory Backtracking for Long Video Understanding
Jialong Zuo
Yongtai Deng
Lingdong Kong
J. Yang
Rui Jin
Y. Zhang
Nong Sang
Liang Pan
Ziwei Liu
Changxin Gao
193
7
0
14 Oct 2025
video-SALMONN S: Memory-Enhanced Streaming Audio-Visual LLM
video-SALMONN S: Memory-Enhanced Streaming Audio-Visual LLM
Guangzhi Sun
Yixuan Li
Xiaodong Wu
Yudong Yang
Wei Li
Zejun Ma
Chao Zhang
127
1
0
13 Oct 2025
A Survey on Agentic Multimodal Large Language Models
A Survey on Agentic Multimodal Large Language Models
Huanjin Yao
Ruifei Zhang
Jiaxing Huang
Jingyi Zhang
Yibo Wang
...
Ruolin Zhu
Yongcheng Jing
Shunyu Liu
Guanbin Li
Dacheng Tao
LM&RoAIFinAI4TSLRMAI4CE
303
12
0
13 Oct 2025
ExpVid: A Benchmark for Experiment Video Understanding & Reasoning
ExpVid: A Benchmark for Experiment Video Understanding & Reasoning
Yicheng Xu
Y. Wu
Jiashuo Yu
Ziang Yan
Tianxiang Jiang
...
Kai Chen
Yu Qiao
Limin Wang
Manabu Okumura
Y. Wang
LRM
187
1
0
13 Oct 2025
ChoirRec: Semantic User Grouping via LLMs for Conversion Rate Prediction of Low-Activity Users
ChoirRec: Semantic User Grouping via LLMs for Conversion Rate Prediction of Low-Activity Users
Dakai Zhai
Jiong Gao
Boya Du
Junwei Xu
Qijie Shen
J. Zhu
Yuning Jiang
179
31
0
10 Oct 2025
Harnessing Synthetic Preference Data for Enhancing Temporal Understanding of Video-LLMs
Harnessing Synthetic Preference Data for Enhancing Temporal Understanding of Video-LLMs
Sameep Vani
Shreyas Jena
Maitreya Patel
Chitta Baral
Somak Aditya
Yezhou Yang
AI4TSSyDa
184
0
0
04 Oct 2025
v-HUB: A Benchmark for Video Humor Understanding from Vision and Sound
v-HUB: A Benchmark for Video Humor Understanding from Vision and Sound
Zhengpeng Shi
Hengli Li
Yanpeng Zhao
Jianqun Zhou
Yuxuan Wang
Qinrong Cui
Wei Bi
Songchun Zhu
Bo Zhao
VLM
205
0
0
30 Sep 2025
AccidentBench: Benchmarking Multimodal Understanding and Reasoning in Vehicle Accidents and Beyond
AccidentBench: Benchmarking Multimodal Understanding and Reasoning in Vehicle Accidents and Beyond
Shangding Gu
Xiaohan Wang
Donghao Ying
Haoyu Zhao
Runing Yang
...
Marco Pavone
Serena Yeung-Levy
Jun Wang
Dawn Song
C. Spanos
149
2
0
30 Sep 2025
NeMo: Needle in a Montage for Video-Language Understanding
NeMo: Needle in a Montage for Video-Language Understanding
Zi-Yuan Hu
Shuo Liang
Duo Zheng
Yanyang Li
Yeyao Tao
...
Jianguang Yu
Jing-ling Huang
Meng Fang
Yin Li
Liwei Wang
216
2
0
29 Sep 2025
FrameThinker: Learning to Think with Long Videos via Multi-Turn Frame Spotlighting
FrameThinker: Learning to Think with Long Videos via Multi-Turn Frame Spotlighting
Zefeng He
Xiaoye Qu
Yafu Li
Siyuan Huang
Daizong Liu
Yu Cheng
OffRLVLMLRM
363
16
0
29 Sep 2025
LOVE-R1: Advancing Long Video Understanding with an Adaptive Zoom-in Mechanism via Multi-Step Reasoning
LOVE-R1: Advancing Long Video Understanding with an Adaptive Zoom-in Mechanism via Multi-Step Reasoning
Shenghao Fu
Q. Yang
Yuan-Ming Li
Xihan Wei
Xiaohua Xie
Wei-Shi Zheng
LRM
201
13
0
29 Sep 2025
FreeRet: MLLMs as Training-Free Retrievers
FreeRet: MLLMs as Training-Free Retrievers
Yuhan Zhu
Xiangyu Zeng
Chenting Wang
Xinhao Li
Yicheng Xu
Ziang Yan
Yi Wang
Limin Wang
OffRLVLMLRM
241
4
0
29 Sep 2025
ReWatch-R1: Boosting Complex Video Reasoning in Large Vision-Language Models through Agentic Data Synthesis
ReWatch-R1: Boosting Complex Video Reasoning in Large Vision-Language Models through Agentic Data Synthesis
Congzhi Zhang
Zhibin Wang
Yinchao Ma
Jiawei Peng
Y. Wang
Qiang Zhou
Jun Song
Bo Zheng
OffRLAI4TSLRM
282
9
0
28 Sep 2025
Evaluating point-light biological motion in multimodal large language models
Evaluating point-light biological motion in multimodal large language models
Akila Kadambi
Marco Iacoboni
Lisa Aziz-Zadeh
Srini Narayanan
156
1
0
27 Sep 2025
VideoChat-R1.5: Visual Test-Time Scaling to Reinforce Multimodal Reasoning by Iterative Perception
VideoChat-R1.5: Visual Test-Time Scaling to Reinforce Multimodal Reasoning by Iterative Perception
Ziang Yan
Xinhao Li
Yinan He
Zhengrong Yue
Xiangyu Zeng
Yali Wang
Yu Qiao
Limin Wang
Yi Wang
MLLMVLMLRM
253
26
0
25 Sep 2025
ConViS-Bench: Estimating Video Similarity Through Semantic Concepts
ConViS-Bench: Estimating Video Similarity Through Semantic Concepts
Benedetta Liberatori
Alessandro Conti
Lorenzo Vaquero
Yiming Wang
Elisa Ricci
Paolo Rota
196
1
0
23 Sep 2025
Do Modern Video-LLMs Need to Listen? A Benchmark Audit and Scalable Remedy
Do Modern Video-LLMs Need to Listen? A Benchmark Audit and Scalable Remedy
Geewook Kim
Minjoon Seo
AuLLM
163
0
0
22 Sep 2025
NeuS-QA: Grounding Long-Form Video Understanding in Temporal Logic and Neuro-Symbolic Reasoning
NeuS-QA: Grounding Long-Form Video Understanding in Temporal Logic and Neuro-Symbolic Reasoning
Sahil Shah
S P Sharan
Harsh Goel
Minkyu Choi
Mustafa Munir
Manvik Pasula
R. Marculescu
Sandeep Chinchali
NAI
220
4
0
22 Sep 2025
VideoPro: Adaptive Program Reasoning for Long Video Understanding
VideoPro: Adaptive Program Reasoning for Long Video Understanding
Chenglin Li
Feng Han
FengTao
Ruilin Li
Qianglong Chen
...
Jiaqi Wang
Feng Tao
Jingqi Tong
Yin Zhang
Jiaqi Wang
LRM
235
0
0
22 Sep 2025
Qwen3-Omni Technical Report
Qwen3-Omni Technical Report
Jin Xu
Zhifang Guo
Hangrui Hu
Yunfei Chu
Xiong Wang
...
Bowen Yu
Jianxin Yang
Le Yu
Jingren Zhou
Junyang Lin
AuLLMVGenVLM
270
191
0
22 Sep 2025
ChronoForge-RL: Chronological Forging through Reinforcement Learning for Enhanced Video Understanding
ChronoForge-RL: Chronological Forging through Reinforcement Learning for Enhanced Video Understanding
Kehua Chen
VGen
152
1
0
19 Sep 2025
Cinéaste: A Fine-grained Contextual Movie Question Answering Benchmark
Cinéaste: A Fine-grained Contextual Movie Question Answering Benchmark
Nisarg A. Shah
Amir Ziai
Chaitanya Ekanadham
Vishal M. Patel
VGenCoGeELM
181
0
0
17 Sep 2025
AToken: A Unified Tokenizer for Vision
AToken: A Unified Tokenizer for Vision
Jiasen Lu
Liangchen Song
Mingze Xu
Byeongjoo Ahn
Yanjun Wang
Chen Chen
Afshin Dehghan
Yinfei Yang
ViT
344
14
0
17 Sep 2025
123
Next
Page 1 of 3