Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2409.14485
Cited By
Video-XL: Extra-Long Vision Language Model for Hour-Scale Video Understanding
22 September 2024
Yan Shu
Peitian Zhang
Zheng Liu
Minghao Qin
Junjie Zhou
Tiejun Huang
Bo Zhao
VLM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Video-XL: Extra-Long Vision Language Model for Hour-Scale Video Understanding"
9 / 9 papers shown
Title
Video-XL-Pro: Reconstructive Token Compression for Extremely Long Video Understanding
Xiangrui Liu
Yan Shu
Zheng Liu
Ao Li
Yang Tian
Bo Zhao
VGen
VLM
86
0
0
24 Mar 2025
Cross-Modal and Uncertainty-Aware Agglomeration for Open-Vocabulary 3D Scene Understanding
Jinlong Li
Cristiano Saltori
Fabio Poiesi
N. Sebe
71
0
0
20 Mar 2025
LVAgent: Long Video Understanding by Multi-Round Dynamical Collaboration of MLLM Agents
Boyu Chen
Zhengrong Yue
Siran Chen
Z. Wang
Yang Liu
Peng Li
Y. Wang
VLM
71
0
0
13 Mar 2025
BIMBA: Selective-Scan Compression for Long-Range Video Question Answering
Md. Mohaiminul Islam
Tushar Nagarajan
Huiyu Wang
Gedas Bertasius
Lorenzo Torresani
66
0
0
12 Mar 2025
MEDA: Dynamic KV Cache Allocation for Efficient Multimodal Long-Context Inference
Zhongwei Wan
H. Shen
Xin Wang
C. Liu
Zheda Mai
M. Zhang
VLM
58
3
0
24 Feb 2025
InternVideo2.5: Empowering Video MLLMs with Long and Rich Context Modeling
Yi Wang
Xinhao Li
Ziang Yan
Yinan He
Jiashuo Yu
...
Kai Chen
Wenhai Wang
Yu Qiao
Yali Wang
Limin Wang
70
19
0
21 Jan 2025
ReTaKe: Reducing Temporal and Knowledge Redundancy for Long Video Understanding
Xiao Wang
Qingyi Si
Jianlong Wu
Shiyu Zhu
Li Cao
Liqiang Nie
VLM
74
6
0
29 Dec 2024
Do Language Models Understand Time?
Xi Ding
Lei Wang
162
0
0
18 Dec 2024
VideoSAVi: Self-Aligned Video Language Models without Human Supervision
Yogesh Kulkarni
Pooyan Fazli
VLM
90
2
0
01 Dec 2024
1