ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2501.03218
  4. Cited By

Dispider: Enabling Video LLMs with Active Real-Time Interaction via Disentangled Perception, Decision, and Reaction

Computer Vision and Pattern Recognition (CVPR), 2025
6 January 2025
Rui Qian
Shuangrui Ding
Xiaoyi Dong
Pan Zhang
Yuhang Zang
Yuhang Cao
Dahua Lin
Jiaqi Wang
ArXiv (abs)PDFHTMLHuggingFace (37 upvotes)

Papers citing "Dispider: Enabling Video LLMs with Active Real-Time Interaction via Disentangled Perception, Decision, and Reaction"

18 / 18 papers shown
StreamGaze: Gaze-Guided Temporal Reasoning and Proactive Understanding in Streaming Videos
Daeun Lee
Subhojyoti Mukherjee
Branislav Kveton
Ryan Rossi
Viet Dac Lai
Seunghyun Yoon
Trung Bui
Franck Dernoncourt
Mohit Bansal
LRM
225
0
0
01 Dec 2025
StreamKV: Streaming Video Question-Answering with Segment-based KV Cache Retrieval and Compression
StreamKV: Streaming Video Question-Answering with Segment-based KV Cache Retrieval and Compression
Yilong Chen
Xiang Bai
Zhibin Wang
Chengyu Bai
Yuhan Dai
Ming Lu
Shanghang Zhang
140
1
0
10 Nov 2025
Cambrian-S: Towards Spatial Supersensing in Video
Cambrian-S: Towards Spatial Supersensing in Video
Shusheng Yang
J. Yang
Pinzhi Huang
Ellis L Brown
Zihao Yang
...
Daohan Lu
Rob Fergus
Yann LeCun
Li Fei-Fei
Saining Xie
175
17
0
06 Nov 2025
StreamingTOM: Streaming Token Compression for Efficient Video Understanding
StreamingTOM: Streaming Token Compression for Efficient Video Understanding
Xueyi Chen
Keda Tao
Kele Shao
Huan Wang
194
2
0
21 Oct 2025
video-SALMONN S: Streaming Audio-Visual LLMs Beyond Length Limits via Memory
video-SALMONN S: Streaming Audio-Visual LLMs Beyond Length Limits via Memory
Guangzhi Sun
Yixuan Li
Xiaodong Wu
Yudong Yang
Wei Li
Zejun Ma
Chao Zhang
87
1
0
13 Oct 2025
StreamForest: Efficient Online Video Understanding with Persistent Event Memory
StreamForest: Efficient Online Video Understanding with Persistent Event Memory
Xiangyu Zeng
Kefan Qiu
Qingyu Zhang
Xinhao Li
Jing Wang
...
Kun Tian
Meng Tian
Xinhai Zhao
Yi Wang
Limin Wang
231
2
0
29 Sep 2025
Boosting Embodied AI Agents through Perception-Generation Disaggregation and Asynchronous Pipeline Execution
Boosting Embodied AI Agents through Perception-Generation Disaggregation and Asynchronous Pipeline Execution
Shulai Zhang
Ao Xu
Quan Chen
Han Zhao
Weihao Cui
Ningxin Zheng
H. Lin
Xin Liu
Minyi Guo
115
0
0
11 Sep 2025
Less Redundancy: Boosting Practicality of Vision Language Model in Walking Assistants
Less Redundancy: Boosting Practicality of Vision Language Model in Walking Assistants
Chongyang Li
Yuan Zhiqiang
Jiapei Zhang
Ying Deng
Hanbo Bi
Zexi Jia
Xiaoyue Duan
Peixiang Luo
Jinchao Zhang
VLM
139
0
0
22 Aug 2025
StreamMem: Query-Agnostic KV Cache Memory for Streaming Video Understanding
StreamMem: Query-Agnostic KV Cache Memory for Streaming Video Understanding
Yanlai Yang
Zhuokai Zhao
Satya Narayan Shukla
Aashu Singh
Shlok Kumar Mishra
Lizhu Zhang
Mengye Ren
VLM
124
6
0
21 Aug 2025
HumanPCR: Probing MLLM Capabilities in Diverse Human-Centric Scenes
HumanPCR: Probing MLLM Capabilities in Diverse Human-Centric Scenes
Keliang Li
Hongze Shen
Hao Shi
Ruibing Hou
Hong Chang
...
Wen Wang
Yiling Wu
Shihong Deng
Shiguang Shan
Xilin Chen
LRM
180
1
0
19 Aug 2025
StreamAgent: Towards Anticipatory Agents for Streaming Video Understanding
StreamAgent: Towards Anticipatory Agents for Streaming Video Understanding
Haolin Yang
Feilong Tang
Linxiao Zhao
Xiang An
Ming Hu
...
Yifan Lu
Xiaofeng Zhang
Abdalla Swikir
Junjun He
Zongyuan Ge
343
4
0
03 Aug 2025
Chat with AI: The Surprising Turn of Real-time Video Communication from Human to AI
Chat with AI: The Surprising Turn of Real-time Video Communication from Human to AIACM Workshop on Hot Topics in Networks (HotNets), 2025
Jiangkai Wu
Zhiyuan Ren
Liming Liu
Xinggong Zhang
169
1
0
14 Jul 2025
HoliTom: Holistic Token Merging for Fast Video Large Language Models
HoliTom: Holistic Token Merging for Fast Video Large Language Models
Kele Shao
Keda Tao
Can Qin
Haoxuan You
Yang Sui
Huan Wang
VLM
618
15
0
27 May 2025
StreamBridge: Turning Your Offline Video Large Language Model into a Proactive Streaming Assistant
StreamBridge: Turning Your Offline Video Large Language Model into a Proactive Streaming Assistant
Haibo Wang
Bo Feng
Zhengfeng Lai
Mingze Xu
Shiyu Li
Weifeng Ge
Afshin Dehghan
Meng Cao
Ping Huang
OffRL
616
6
0
08 May 2025
RTV-Bench: Benchmarking MLLM Continuous Perception, Understanding and Reasoning through Real-Time Video
RTV-Bench: Benchmarking MLLM Continuous Perception, Understanding and Reasoning through Real-Time Video
Shuhang Xun
Sicheng Tao
Jiajun Li
Yibo Shi
Zhixin Lin
...
Shikang Wang
Wenshu Fan
Hao Zhang
Ying Ma
Xuming Hu
VLMLRM
378
5
0
04 May 2025
TimeChat-Online: 80% Visual Tokens are Naturally Redundant in Streaming Videos
TimeChat-Online: 80% Visual Tokens are Naturally Redundant in Streaming Videos
Linli Yao
You Li
Y. X. Wei
Lei Li
Shuhuai Ren
...
Sida Li
Dianbo Sui
Qi Liu
Yanzhe Zhang
Xu Sun
275
17
0
24 Apr 2025
ViSpeak: Visual Instruction Feedback in Streaming Videos
ViSpeak: Visual Instruction Feedback in Streaming Videos
Shenghao Fu
Q. Yang
Yuan-Ming Li
Yi-Xing Peng
Kun-Yu Lin
Xihan Wei
Jian-Fang Hu
Xiaohua Xie
Wei-Shi Zheng
VLM
299
10
0
17 Mar 2025
VideoLLM Knows When to Speak: Enhancing Time-Sensitive Video Comprehension with Video-Text Duet Interaction Format
VideoLLM Knows When to Speak: Enhancing Time-Sensitive Video Comprehension with Video-Text Duet Interaction FormatConference on Empirical Methods in Natural Language Processing (EMNLP), 2024
Yueqian Wang
Xiaojun Meng
Yijiao Wang
Jianxin Liang
Jiansheng Wei
Huishuai Zhang
Dongyan Zhao
VGen
268
19
0
27 Nov 2024
1