ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2308.09126
  4. Cited By
EgoSchema: A Diagnostic Benchmark for Very Long-form Video Language
  Understanding

EgoSchema: A Diagnostic Benchmark for Very Long-form Video Language Understanding

17 August 2023
K. Mangalam
Raiymbek Akshulakov
Jitendra Malik
ArXivPDFHTML

Papers citing "EgoSchema: A Diagnostic Benchmark for Very Long-form Video Language Understanding"

50 / 190 papers shown
Title
4D-Bench: Benchmarking Multi-modal Large Language Models for 4D Object Understanding
4D-Bench: Benchmarking Multi-modal Large Language Models for 4D Object Understanding
Wenxuan Zhu
Bing Li
Cheng Zheng
Jinjie Mai
Jun-Cheng Chen
...
Abdullah Hamdi
Sara Rojas Martinez
Chia-Wen Lin
Mohamed Elhoseiny
Bernard Ghanem
VLM
48
0
0
22 Mar 2025
Agentic Keyframe Search for Video Question Answering
Agentic Keyframe Search for Video Question Answering
Sunqi Fan
Meng-Hao Guo
Shuojin Yang
45
0
0
20 Mar 2025
FAVOR-Bench: A Comprehensive Benchmark for Fine-Grained Video Motion Understanding
FAVOR-Bench: A Comprehensive Benchmark for Fine-Grained Video Motion Understanding
Chongjun Tu
Lin Zhang
Pengtao Chen
Peng Ye
Xianfang Zeng
W. Cheng
Gang Yu
Tao Chen
79
0
0
19 Mar 2025
EgoDTM: Towards 3D-Aware Egocentric Video-Language Pretraining
EgoDTM: Towards 3D-Aware Egocentric Video-Language Pretraining
Boshen Xu
Yuting Mei
Xinbi Liu
Sipeng Zheng
Qin Jin
VLM
MDE
60
0
0
19 Mar 2025
VITED: Video Temporal Evidence Distillation
VITED: Video Temporal Evidence Distillation
Yujie Lu
Yale Song
William Yang Wang
Lorenzo Torresani
Tushar Nagarajan
56
0
0
17 Mar 2025
Omnia de EgoTempo: Benchmarking Temporal Understanding of Multi-Modal LLMs in Egocentric Videos
Omnia de EgoTempo: Benchmarking Temporal Understanding of Multi-Modal LLMs in Egocentric Videos
Chiara Plizzari
A. Tonioni
Yongqin Xian
Achin Kulshrestha
F. Tombari
EgoV
56
0
0
17 Mar 2025
A Survey on the Optimization of Large Language Model-based Agents
A Survey on the Optimization of Large Language Model-based Agents
Shangheng Du
Jiabao Zhao
Jinxin Shi
Zhentao Xie
Xin Jiang
Yanhong Bai
Liang He
LLMAG
LM&Ro
LM&MA
126
0
0
16 Mar 2025
ST-Think: How Multimodal Large Language Models Reason About 4D Worlds from Ego-Centric Videos
ST-Think: How Multimodal Large Language Models Reason About 4D Worlds from Ego-Centric Videos
Peiran Wu
Yunze Liu
Chonghan Liu
Miao Liu
VGen
LRM
57
1
0
16 Mar 2025
Does Your Vision-Language Model Get Lost in the Long Video Sampling Dilemma?
Does Your Vision-Language Model Get Lost in the Long Video Sampling Dilemma?
Tianyuan Qu
Longxiang Tang
Bohao Peng
Senqiao Yang
Bei Yu
Jiaya Jia
VLM
77
0
0
16 Mar 2025
LLaVA-MLB: Mitigating and Leveraging Attention Bias for Training-Free Video LLMs
Leqi Shen
Tao He
Guoqiang Gong
Fan Yang
Y. Zhang
Pengzhang Liu
Sicheng Zhao
Guiguang Ding
42
0
0
14 Mar 2025
Keyframe-oriented Vision Token Pruning: Enhancing Efficiency of Large Vision Language Models on Long-Form Video Processing
Keyframe-oriented Vision Token Pruning: Enhancing Efficiency of Large Vision Language Models on Long-Form Video Processing
Yudong Liu
Jingwei Sun
Yueqian Lin
Jingyang Zhang
Ming Yin
Qinsi Wang
J. Zhang
H. Li
Y. Chen
VLM
68
2
0
13 Mar 2025
LVAgent: Long Video Understanding by Multi-Round Dynamical Collaboration of MLLM Agents
LVAgent: Long Video Understanding by Multi-Round Dynamical Collaboration of MLLM Agents
Boyu Chen
Zhengrong Yue
Siran Chen
Z. Wang
Yang Liu
Peng Li
Y. Wang
VLM
76
0
0
13 Mar 2025
How Do Multimodal Large Language Models Handle Complex Multimodal Reasoning? Placing Them in An Extensible Escape Game
How Do Multimodal Large Language Models Handle Complex Multimodal Reasoning? Placing Them in An Extensible Escape Game
Z. Wang
Yurui Dong
Fuwen Luo
Minyuan Ruan
Zhili Cheng
C. L. P. Chen
Peng Li
Yang Liu
LRM
79
0
0
13 Mar 2025
BIMBA: Selective-Scan Compression for Long-Range Video Question Answering
Md. Mohaiminul Islam
Tushar Nagarajan
Huiyu Wang
Gedas Bertasius
Lorenzo Torresani
66
0
0
12 Mar 2025
VLog: Video-Language Models by Generative Retrieval of Narration Vocabulary
Kevin Qinghong Lin
Mike Zheng Shou
VGen
71
1
0
12 Mar 2025
Exo2Ego: Exocentric Knowledge Guided MLLM for Egocentric Video Understanding
Haoyu Zhang
Qiaohui Chu
Meng Liu
Yunxiao Wang
Bin Wen
Fan Yang
Tingting Gao
Di Zhang
Yaowei Wang
Liqiang Nie
EgoV
68
0
0
12 Mar 2025
RAG-Adapter: A Plug-and-Play RAG-enhanced Framework for Long Video Understanding
Xichen Tan
Yunfan Ye
Yuanjing Luo
Qian Wan
Fang Liu
Zhiping Cai
VLM
61
1
0
11 Mar 2025
EgoBlind: Towards Egocentric Visual Assistance for the Blind People
Junbin Xiao
Nanxin Huang
Hao Qiu
Zhulin Tao
Xun Yang
Richang Hong
M. Wang
Angela Yao
EgoV
VLM
63
0
0
11 Mar 2025
ALLVB: All-in-One Long Video Understanding Benchmark
ALLVB: All-in-One Long Video Understanding Benchmark
Xichen Tan
Yuanjing Luo
Yunfan Ye
Fang Liu
Zhiping Cai
MLLM
VLM
67
0
0
10 Mar 2025
Towards Fine-Grained Video Question Answering
Wei Dai
Alan Luo
Zane Durante
Debadutta Dash
Arnold Milstein
Kevin Schulman
Ehsan Adeli
L. Fei-Fei
58
1
0
10 Mar 2025
UrbanVideo-Bench: Benchmarking Vision-Language Models on Embodied Intelligence with Video Data in Urban Spaces
Baining Zhao
Jianjie Fang
Zichao Dai
Z. Wang
Jirong Zha
...
Chen Gao
Y. Wang
Jinqiang Cui
Xinlei Chen
Y. Li
48
2
0
08 Mar 2025
EgoLife: Towards Egocentric Life Assistant
Jingkang Yang
Shuai Liu
Hongming Guo
Yuhao Dong
X. Zhang
...
Joerg Widmer
Francesco Gringoli
Lei Yang
Bo Li
Z. Liu
EgoV
49
2
0
05 Mar 2025
DivPrune: Diversity-based Visual Token Pruning for Large Multimodal Models
DivPrune: Diversity-based Visual Token Pruning for Large Multimodal Models
Saeed Ranjbar Alvar
Gursimran Singh
Mohammad Akbari
Yong Zhang
VLM
73
0
0
04 Mar 2025
HarmonySet: A Comprehensive Dataset for Understanding Video-Music Semantic Alignment and Temporal Synchronization
Zitang Zhou
Ke Mei
Yu Lu
Tianyi Wang
Fengyun Rao
83
2
0
03 Mar 2025
Streaming Video Question-Answering with In-context Video KV-Cache Retrieval
Shangzhe Di
Zhelun Yu
Guanghao Zhang
Haoyuan Li
Tao Zhong
Hao Cheng
Bolin Li
Wanggui He
Fangxun Shu
Hao Jiang
68
4
0
01 Mar 2025
EgoNormia: Benchmarking Physical Social Norm Understanding
EgoNormia: Benchmarking Physical Social Norm Understanding
MohammadHossein Rezaei
Yicheng Fu
Phil Cuvin
Caleb Ziems
Y. Zhang
Hao Zhu
Diyi Yang
EgoV
48
0
0
27 Feb 2025
OmniQuery: Contextually Augmenting Captured Multimodal Memory to Enable Personal Question Answering
OmniQuery: Contextually Augmenting Captured Multimodal Memory to Enable Personal Question Answering
Jiahao Nick Li
Zhuohao Jerry Zhang
Zhang
51
0
0
24 Feb 2025
video-SALMONN-o1: Reasoning-enhanced Audio-visual Large Language Model
video-SALMONN-o1: Reasoning-enhanced Audio-visual Large Language Model
Guangzhi Sun
Yudong Yang
Jimin Zhuang
Changli Tang
Y. Li
W. Li
Z. Ma
Chao Zhang
LRM
MLLM
VLM
64
2
0
17 Feb 2025
VRoPE: Rotary Position Embedding for Video Large Language Models
VRoPE: Rotary Position Embedding for Video Large Language Models
Zikang Liu
Longteng Guo
Yepeng Tang
Junxian Cai
Kai Ma
Xi Chen
J. Liu
49
0
0
17 Feb 2025
SVBench: A Benchmark with Temporal Multi-Turn Dialogues for Streaming Video Understanding
SVBench: A Benchmark with Temporal Multi-Turn Dialogues for Streaming Video Understanding
Zhenyu Yang
Y. Hu
Zemin Du
Dizhan Xue
Shengsheng Qian
Jiahong Wu
Fan Yang
W. Dong
Changsheng Xu
47
2
0
15 Feb 2025
VideoRAG: Retrieval-Augmented Generation with Extreme Long-Context Videos
VideoRAG: Retrieval-Augmented Generation with Extreme Long-Context Videos
Xubin Ren
Lingrui Xu
Long Xia
S. Wang
Dawei Yin
Chao Huang
VGen
VLM
66
3
0
03 Feb 2025
Baichuan-Omni-1.5 Technical Report
Yadong Li
J. Liu
Tao Zhang
Tao Zhang
S. Chen
...
Jianhua Xu
Haoze Sun
Mingan Lin
Zenan Zhou
Weipeng Chen
AuLLM
67
10
0
28 Jan 2025
ENTER: Event Based Interpretable Reasoning for VideoQA
ENTER: Event Based Interpretable Reasoning for VideoQA
Hammad A. Ayyubi
Junzhang Liu
Ali Asgarov
Zaber Ibn Abdul Hakim
Najibul Haque Sarker
...
Md. Atabuzzaman
Xudong Lin
Naveen Reddy Dyava
Shih-Fu Chang
Chris Thomas
NAI
52
2
0
24 Jan 2025
InternVideo2.5: Empowering Video MLLMs with Long and Rich Context Modeling
InternVideo2.5: Empowering Video MLLMs with Long and Rich Context Modeling
Yi Wang
Xinhao Li
Ziang Yan
Yinan He
Jiashuo Yu
...
Kai Chen
Wenhai Wang
Yu Qiao
Yali Wang
Limin Wang
73
19
0
21 Jan 2025
LLaVA-Mini: Efficient Image and Video Large Multimodal Models with One Vision Token
LLaVA-Mini: Efficient Image and Video Large Multimodal Models with One Vision Token
Shaolei Zhang
Qingkai Fang
Zhe Yang
Yang Feng
MLLM
VLM
59
24
0
07 Jan 2025
MotionBench: Benchmarking and Improving Fine-grained Video Motion Understanding for Vision Language Models
Wenyi Hong
Yean Cheng
Z. Yang
Weihan Wang
Lefan Wang
Xiaotao Gu
Shiyu Huang
Yuxiao Dong
J. Tang
CoGe
VLM
71
4
0
06 Jan 2025
MLVU: Benchmarking Multi-task Long Video Understanding
MLVU: Benchmarking Multi-task Long Video Understanding
Junjie Zhou
Yan Shu
Bo Zhao
Boya Wu
Zhengyang Liang
...
Xi Yang
Y. Xiong
Bo Zhang
Tiejun Huang
Zheng Liu
VLM
56
11
0
03 Jan 2025
A Simple Recipe for Contrastively Pre-training Video-First Encoders Beyond 16 Frames
A Simple Recipe for Contrastively Pre-training Video-First Encoders Beyond 16 Frames
Pinelopi Papalampidi
Skanda Koppula
Shreya Pathak
Justin T Chiu
Joseph Heyward
Viorica Patraucean
Jiajun Shen
Antoine Miech
Andrew Zisserman
Aida Nematzdeh
VLM
56
23
0
31 Dec 2024
VideoChat-Flash: Hierarchical Compression for Long-Context Video Modeling
VideoChat-Flash: Hierarchical Compression for Long-Context Video Modeling
Xinhao Li
Yi Wang
Jiashuo Yu
Xiangyu Zeng
Yuhan Zhu
...
Yinan He
Chenting Wang
Yu Qiao
Yali Wang
L. Wang
VLM
73
25
0
31 Dec 2024
SCBench: A Sports Commentary Benchmark for Video LLMs
SCBench: A Sports Commentary Benchmark for Video LLMs
Kuangzhi Ge
L. Chen
Kevin Zhang
Yulin Luo
Tianyu Shi
Liaoyuan Fan
Xiang Li
Guanqun Wang
S. Zhang
41
0
0
23 Dec 2024
Thinking in Space: How Multimodal Large Language Models See, Remember,
  and Recall Spaces
Thinking in Space: How Multimodal Large Language Models See, Remember, and Recall Spaces
Jihan Yang
Shusheng Yang
Anjali W. Gupta
Rilyn Han
Li Fei-Fei
Saining Xie
LRM
119
50
0
18 Dec 2024
Do Language Models Understand Time?
Do Language Models Understand Time?
Xi Ding
Lei Wang
162
0
0
18 Dec 2024
Neptune: The Long Orbit to Benchmarking Long Video Understanding
Arsha Nagrani
Mingda Zhang
Ramin Mehran
Rachel Hornung
N. B. Gundavarapu
...
Boqing Gong
Cordelia Schmid
Mikhail Sirotenko
Yukun Zhu
Tobias Weyand
98
4
0
12 Dec 2024
Dynamic-VLM: Simple Dynamic Visual Token Compression for VideoLLM
Dynamic-VLM: Simple Dynamic Visual Token Compression for VideoLLM
H. Wang
Yuxiang Nie
Yongjie Ye
Deng GuanYu
Yanjie Wang
Shuai Li
Haiyang Yu
Jinghui Lu
Can Huang
VLM
MLLM
79
1
0
12 Dec 2024
Lyra: An Efficient and Speech-Centric Framework for Omni-Cognition
Lyra: An Efficient and Speech-Centric Framework for Omni-Cognition
Zhisheng Zhong
Chengyao Wang
Yuqi Liu
Senqiao Yang
Longxiang Tang
...
Shaozuo Yu
Sitong Wu
Eric Lo
Shu-Lin Liu
Jiaya Jia
AuLLM
100
6
0
12 Dec 2024
AIM: Adaptive Inference of Multi-Modal LLMs via Token Merging and
  Pruning
AIM: Adaptive Inference of Multi-Modal LLMs via Token Merging and Pruning
Yiwu Zhong
Zhuoming Liu
Yin Li
Liwei Wang
82
2
0
04 Dec 2024
Progress-Aware Video Frame Captioning
Progress-Aware Video Frame Captioning
Zihui Xue
Joungbin An
Xitong Yang
Kristen Grauman
98
1
0
03 Dec 2024
PhysGame: Uncovering Physical Commonsense Violations in Gameplay Videos
PhysGame: Uncovering Physical Commonsense Violations in Gameplay Videos
Meng Cao
Haoran Tang
Haoze Zhao
Hangyu Guo
J. H. Liu
Ge Zhang
Ruyang Liu
Qiang Sun
Ian Reid
Xiaodan Liang
93
2
0
02 Dec 2024
Eyes on the Road: State-of-the-Art Video Question Answering Models
  Assessment for Traffic Monitoring Tasks
Eyes on the Road: State-of-the-Art Video Question Answering Models Assessment for Traffic Monitoring Tasks
Joseph Raj Vishal
Divesh Basina
Aarya Choudhary
Bharatesh Chakravarthi
64
1
0
02 Dec 2024
VideoSAVi: Self-Aligned Video Language Models without Human Supervision
VideoSAVi: Self-Aligned Video Language Models without Human Supervision
Yogesh Kulkarni
Pooyan Fazli
VLM
93
2
0
01 Dec 2024
Previous
1234
Next