Dispider: Enabling Video LLMs with Active Real-Time Interaction via Disentangled Perception, Decision, and Reaction

Computer Vision and Pattern Recognition (CVPR), 2025

6 January 2025

ArXiv (abs)PDF HTML HuggingFace (37 upvotes)

Papers citing "Dispider: Enabling Video LLMs with Active Real-Time Interaction via Disentangled Perception, Decision, and Reaction"

18 / 18 papers shown

StreamGaze: Gaze-Guided Temporal Reasoning and Proactive Understanding in Streaming Videos

225

01 Dec 2025

StreamKV: Streaming Video Question-Answering with Segment-based KV Cache Retrieval and Compression

140

10 Nov 2025

Cambrian-S: Towards Spatial Supersensing in Video

...

175

06 Nov 2025

StreamingTOM: Streaming Token Compression for Efficient Video Understanding

194

21 Oct 2025

video-SALMONN S: Streaming Audio-Visual LLMs Beyond Length Limits via Memory

13 Oct 2025

StreamForest: Efficient Online Video Understanding with Persistent Event Memory

...

231

29 Sep 2025

Boosting Embodied AI Agents through Perception-Generation Disaggregation and Asynchronous Pipeline Execution

115

11 Sep 2025

Less Redundancy: Boosting Practicality of Vision Language Model in Walking Assistants

139

22 Aug 2025

StreamMem: Query-Agnostic KV Cache Memory for Streaming Video Understanding

124

21 Aug 2025

HumanPCR: Probing MLLM Capabilities in Diverse Human-Centric Scenes

...

180

19 Aug 2025

StreamAgent: Towards Anticipatory Agents for Streaming Video Understanding

...

343

03 Aug 2025

Chat with AI: The Surprising Turn of Real-time Video Communication from Human to AIACM Workshop on Hot Topics in Networks (HotNets), 2025

169

14 Jul 2025

HoliTom: Holistic Token Merging for Fast Video Large Language Models

618

27 May 2025

StreamBridge: Turning Your Offline Video Large Language Model into a Proactive Streaming Assistant

616

08 May 2025

RTV-Bench: Benchmarking MLLM Continuous Perception, Understanding and Reasoning through Real-Time Video

...

378

04 May 2025

TimeChat-Online: 80% Visual Tokens are Naturally Redundant in Streaming Videos

...

275

24 Apr 2025

ViSpeak: Visual Instruction Feedback in Streaming Videos

299

17 Mar 2025

VideoLLM Knows When to Speak: Enhancing Time-Sensitive Video Comprehension with Video-Text Duet Interaction FormatConference on Empirical Methods in Natural Language Processing (EMNLP), 2024

268

27 Nov 2024