From Seconds to Hours: Reviewing MultiModal Large Language Models on
Comprehensive Long Video Understanding

From Seconds to Hours: Reviewing MultiModal Large Language Models on Comprehensive Long Video Understanding

27 September 2024

Huaijian Zhang

Papers citing "From Seconds to Hours: Reviewing MultiModal Large Language Models on Comprehensive Long Video Understanding"

2 / 2 papers shown

Title
DyGEnc: Encoding a Sequence of Textual Scene Graphs to Reason and Answer Questions in Dynamic Scenes S. Linok Vadim Semenov Anastasia Trunova Oleg Bulichev Dmitry A. Yudin 40 0 0 06 May 2025
ReTaKe: Reducing Temporal and Knowledge Redundancy for Long Video Understanding Xiao Wang Qingyi Si Jianlong Wu Shiyu Zhu Li Cao Liqiang Nie VLM 70 6 0 29 Dec 2024