Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
All Papers
0 / 0 papers shown
Title
Home
Papers
2403.11481
Cited By
v1
v2 (latest)
VideoAgent: A Memory-augmented Multimodal Agent for Video Understanding
18 March 2024
Yue Fan
Xiaojian Ma
Rujie Wu
Yuntao Du
Jiaqi Li
Zhi Gao
Qing Li
VLM
LLMAG
Re-assign community
ArXiv (abs)
PDF
HTML
HuggingFace (13 upvotes)
Papers citing
"VideoAgent: A Memory-augmented Multimodal Agent for Video Understanding"
50 / 98 papers shown
Title
Agentic Learner with Grow-and-Refine Multimodal Semantic Memory
Weihao Bo
Shan Zhang
Yanpeng Sun
Jingjing Wu
Qunyi Xie
...
Wei He
Xiaofan Li
Na Zhao
Jingdong Wang
Z. Li
LRM
182
0
0
26 Nov 2025
LAST: LeArning to Think in Space and Time for Generalist Vision-Language Models
Shuai Wang
D. Zhang
Tianyi Bai
Shitong Shao
Jiebo Luo
Jiaheng Wei
VLM
124
0
0
24 Nov 2025
VDC-Agent: When Video Detailed Captioners Evolve Themselves via Agentic Self-Reflection
Qiang Wang
Xinyuan Gao
Songlin Dong
Jizhou Han
Jiangyang Li
Yuhang He
Yihong Gong
VGen
110
0
0
24 Nov 2025
VideoChat-M1: Collaborative Policy Planning for Video Understanding via Multi-Agent Reinforcement Learning
Boyu Chen
Zikang Wang
Zhengrong Yue
Kainan Yan
Chenyun Yu
...
Yafei Wen
Xiaoxin Chen
Yang Liu
Peng Li
Yali Wang
LLMAG
248
0
0
24 Nov 2025
SciEducator: Scientific Video Understanding and Educating via Deming-Cycle Multi-Agent System
Zhiyu Xu
Weilong Yan
Yufei Shi
Xin Meng
Tao He
Huiping Zhuang
Ming Li
Hehe Fan
LLMAG
LRM
142
0
0
22 Nov 2025
TimeViper: A Hybrid Mamba-Transformer Vision-Language Model for Efficient Long Video Understanding
Boshen Xu
Zihan Xiao
Jiaze Li
Jianzhong Ju
Zhenbo Luo
Jian Luan
Qin Jin
Mamba
423
0
0
20 Nov 2025
ReaSon: Reinforced Causal Search with Information Bottleneck for Video Understanding
Yuan Zhou
Litao Hua
Shilong Jin
Wentao Huang
Haoran Duan
CML
VGen
189
0
0
16 Nov 2025
Striking the Right Balance between Compute and Copy: Improving LLM Inferencing Under Speculative Decoding
Arun Ramachandran
Ramaswamy Govindarajan
M. Annavaram
Prakash Raghavendra
Hossein Entezari Zarch
Lei Gao
Chaoyi Jiang
104
0
0
15 Nov 2025
Simulating the Visual World with Artificial Intelligence: A Roadmap
Jingtong Yue
Z. Huang
Z. Chen
Xintao Wang
Pengfei Wan
Ziwei Liu
VGen
LM&Ro
336
0
0
11 Nov 2025
UniVA: Universal Video Agent towards Open-Source Next-Generation Video Generalist
Z. Liang
D. Zhang
Huichi Zhou
Rui Huang
Bobo Li
...
Shengqiong Wu
X. Wang
Jiebo Luo
Lizi Liao
Hao Fei
VGen
153
0
0
11 Nov 2025
Learning with Preserving for Continual Multitask Learning
H. Wang
Siwoo Bae
Zirong Chen
Meiyi Ma
CLL
144
0
0
11 Nov 2025
Can Visual Input Be Compressed? A Visual Token Compression Benchmark for Large Multimodal Models
Tianfan Peng
Yuntao Du
Pengzhou Ji
Shijie Dong
Kailin Jiang
...
Jinhe Bi
Qian Li
Wei Du
Feng Xiao
Lizhen Cui
VLM
192
0
0
04 Nov 2025
Paper2Web: Let's Make Your Paper Alive!
Yuhang Chen
Tianpeng Lv
Siyi Zhang
Yixiang Yin
Yao Wan
Philip S. Yu
Dongping Chen
136
0
0
17 Oct 2025
VTimeCoT: Thinking by Drawing for Video Temporal Grounding and Reasoning
Jinglei Zhang
Yuanfan Guo
Rolandos Alexandros Potamias
Jiankang Deng
Hang Xu
Chao Ma
LRM
91
2
0
16 Oct 2025
Vgent: Graph-based Retrieval-Reasoning-Augmented Generation For Long Video Understanding
Xiaoqian Shen
Wenxuan Zhang
Jun-Cheng Chen
Mohamed Elhoseiny
VLM
LRM
72
4
0
15 Oct 2025
VideoLucy: Deep Memory Backtracking for Long Video Understanding
Jialong Zuo
Yongtai Deng
Lingdong Kong
J. Yang
Rui Jin
Y. Zhang
Nong Sang
Liang Pan
Ziwei Liu
Changxin Gao
101
2
0
14 Oct 2025
FlowSearch: Advancing deep research with dynamic structured knowledge flow
Yusong Hu
Runmin Ma
Yue Fan
Jinxin Shi
Zongsheng Cao
...
Jiakang Yuan
Xiangchao Yan
Wenlong Zhang
Lei Bai
Bo Zhang
AI4CE
120
0
0
09 Oct 2025
MATRIX: Multimodal Agent Tuning for Robust Tool-Use Reasoning
Tajamul Ashraf
Umair Nawaz
Abdelrahman M. Shaker
Rao Muhammad Anwer
Philip Torr
Fahad Shahbaz Khan
Salman Khan
162
0
0
09 Oct 2025
When Thinking Drifts: Evidential Grounding for Robust Video Reasoning
M. Luo
Zihui Xue
Alex Dimakis
Kristen Grauman
VGen
LRM
220
4
0
07 Oct 2025
From Learning to Mastery: Achieving Safe and Efficient Real-World Autonomous Driving with Human-In-The-Loop Reinforcement Learning
Li Zeqiao
Wang Yijing
Wang Haoyu
Li Zheng
Li Peng
Liu Wenfei
Zuo zhiqiang
120
0
0
07 Oct 2025
Seeing Space and Motion: Enhancing Latent Actions with Spatial and Dynamic Awareness for VLA
Zhejia Cai
Y. Yang
Xinyuan Chang
Shiyi Liang
Ronghan Chen
Feng Xiong
Mu Xu
Ruqi Huang
75
0
0
30 Sep 2025
Perceive, Reflect and Understand Long Video: Progressive Multi-Granular Clue Exploration with Interactive Agents
J. Li
Kun-Juan Wei
Zhe Xu
Zibo Su
Xu Yang
Cheng Deng
98
0
0
29 Sep 2025
FrameThinker: Learning to Think with Long Videos via Multi-Turn Frame Spotlighting
Zefeng He
Xiaoye Qu
Yafu Li
Siyuan Huang
Daizong Liu
Yu Cheng
OffRL
VLM
LRM
259
6
0
29 Sep 2025
ReWatch-R1: Boosting Complex Video Reasoning in Large Vision-Language Models through Agentic Data Synthesis
Congzhi Zhang
Zhibin Wang
Yinchao Ma
Jiawei Peng
Y. Wang
Qiang Zhou
Jun Song
Bo Zheng
OffRL
AI4TS
LRM
186
2
0
28 Sep 2025
VC-Agent: An Interactive Agent for Customized Video Dataset Collection
Yidan Zhang
Mutian Xu
Yiming Hao
Kun Zhou
Jiahao Chang
Xiaoqiang Liu
Pengfei Wan
Hongbo Fu
Xiaoguang Han
VGen
152
0
0
25 Sep 2025
COLT: Enhancing Video Large Language Models with Continual Tool Usage
Yuyang Liu
Xinyuan Shi
Xiaondan Liang
KELM
CLL
209
0
0
23 Sep 2025
MESH -- Understanding Videos Like Human: Measuring Hallucinations in Large Video Models
Garry Yang
Zizhe Chen
Man Hon Wong
Haoyu Lei
Yongqiang Chen
Zhenguo Li
Kaiwen Zhou
James Cheng
111
0
0
10 Sep 2025
AdsQA: Towards Advertisement Video Understanding
Xinwei Long
Kai Tian
Peng Xu
Guoli Jia
Jingxuan Li
...
Che Jiang
Hao Xu
Yang Liu
Jiaheng Ma
Bowen Zhou
92
2
0
10 Sep 2025
Video-MTR: Reinforced Multi-Turn Reasoning for Long Video Understanding
Yuan Xie
Tianshui Chen
Zheng Ge
L. Ni
LRM
72
7
0
28 Aug 2025
Failures to Surface Harmful Contents in Video Large Language Models
Yuxin Cao
Wei Song
Derui Wang
Jingling Xue
Jin Song Dong
AAML
119
3
0
14 Aug 2025
Empowering Multimodal LLMs with External Tools: A Comprehensive Survey
Wenbin An
Jiahao Nie
Yaqiang Wu
Feng Tian
Shijian Lu
Q. Zheng
MLLM
150
1
0
14 Aug 2025
Seeing, Listening, Remembering, and Reasoning: A Multimodal Agent with Long-Term Memory
Lin Long
Yexiao He
Wentao Ye
Yiyuan Pan
Yuan Lin
Hang Li
Junbo Zhao
Wei Li
254
7
0
13 Aug 2025
AURA: A Fine-Grained Benchmark and Decomposed Metric for Audio-Visual Reasoning
Siminfar Samakoush Galougah
Rishie Raj
Sanjoy Chowdhury
Sayan Nag
Ramani Duraiswami
148
1
0
10 Aug 2025
VSI: Visual Subtitle Integration for Keyframe Selection to enhance Long Video Understanding
Jianxiang He
Shaoguang Wang
Weiyu Guo
Yijie Xu
Ziyang Chen
Yijie Xu
Ziyang Chen
149
0
0
09 Aug 2025
MV-Debate: Multi-view Agent Debate with Dynamic Reflection Gating for Multimodal Harmful Content Detection in Social Media
Rui Lu
Jinhe Bi
Yunpu Ma
Feng Xiao
Yuntao Du
Yijun Tian
217
1
0
07 Aug 2025
VideoForest: Person-Anchored Hierarchical Reasoning for Cross-Video Question Answering
Yiran Meng
Junhong Ye
Wei Zhou
Guanghui Yue
Xudong Mao
Ruomei Wang
Baoquan Zhao
94
0
0
05 Aug 2025
StreamAgent: Towards Anticipatory Agents for Streaming Video Understanding
Haolin Yang
Feilong Tang
Linxiao Zhao
Xiang An
Ming Hu
...
Yifan Lu
Xiaofeng Zhang
Abdalla Swikir
Junjun He
Zongyuan Ge
263
1
0
03 Aug 2025
Cued-Agent: A Collaborative Multi-Agent System for Automatic Cued Speech Recognition
Guanjie Huang
Danny Hin Kwok Tsang
Shan Yang
Guangzhi Lei
Li Liu
126
0
0
01 Aug 2025
Exploring the Link Between Bayesian Inference and Embodied Intelligence: Toward Open Physical-World Embodied AI Systems
Bin Liu
208
0
0
29 Jul 2025
Augmented Vision-Language Models: A Systematic Review
Anthony C Davis
Burhan Sadiq
Tianmin Shu
Chien-Ming Huang
VLM
LRM
159
0
0
24 Jul 2025
AuroraLong: Bringing RNNs Back to Efficient Open-Ended Video Understanding
Weili Xu
Enxin Song
Wenhao Chai
Xuexiang Wen
Tian-Chun Ye
Gaoang Wang
264
3
0
03 Jul 2025
GraspMAS: Zero-Shot Language-driven Grasp Detection with Multi-Agent System
Quang H. Nguyen
T. H. Le
Huy Le Nguyen
T. Vo
Tung D. Ta
Baoru Huang
Minh Nhat Vu
Anh-Tien Nguyen
179
0
0
23 Jun 2025
AdaVideoRAG: Omni-Contextual Adaptive Retrieval-Augmented Efficient Long Video Understanding
Zhucun Xue
Jiangning Zhang
Xurong Xie
Yuxuan Cai
Yong-Jin Liu
Xiangtai Li
Dacheng Tao
VGen
VLM
290
4
0
16 Jun 2025
Ego-R1: Chain-of-Tool-Thought for Ultra-Long Egocentric Video Reasoning
Ziniu Zhang
Ruiqi Wang
Hongming Guo
Penghao Wu
Yuhao Dong
Xiuying Wang
Jingkang Yang
Hao Zhang
Hongyuan Zhu
Ziwei Liu
RALM
LRM
180
15
0
16 Jun 2025
MAGNET: A Multi-agent Framework for Finding Audio-Visual Needles by Reasoning over Multi-Video Haystacks
Sanjoy Chowdhury
Mohamed Elmoghany
Yohan Abeysinghe
Mahmoud Ahmed
Sayan Nag
Salman Khan
Mohamed Elhoseiny
Dinesh Manocha
289
3
0
08 Jun 2025
From Objects to Anywhere: A Holistic Benchmark for Multi-level Visual Grounding in 3D Scenes
Tianxu Wang
Zhuofan Zhang
Ziyu Zhu
Yue Fan
Jing Xiong
Pengxiang Li
Xiaojian Ma
Qing Li
245
0
0
05 Jun 2025
SiLVR: A Simple Language-based Video Reasoning Framework
Ce Zhang
Yan-Bo Lin
Ziyang Wang
Mohit Bansal
Gedas Bertasius
LRM
138
7
0
30 May 2025
Deep Video Discovery: Agentic Search with Tool Use for Long-form Video Understanding
Xiaoyi Zhang
Zhaoyang Jia
Zongyu Guo
Jiahao Li
Bin Li
Houqiang Li
Yan Lu
556
9
0
23 May 2025
ViQAgent: Zero-Shot Video Question Answering via Agent with Open-Vocabulary Grounding Validation
Tony Montes
Fernando Lozano
254
2
0
21 May 2025
CoT-Vid: Dynamic Chain-of-Thought Routing with Self Verification for Training-Free Video Reasoning
Hongbo Jin
Ruyang Liu
Wenhao Zhang
Guibo Luo
Ge Li
LRM
298
1
0
17 May 2025
1
2
Next