939

StreamMind: Unlocking Full Frame Rate Streaming Video Dialogue through Event-Gated Cognition

Main:9 Pages
7 Figures
Bibliography:4 Pages
9 Tables
Abstract

With the rise of real-world human-AI interaction applications, such as AI assistants, the need for Streaming Video Dialogue is critical. To address this need, we introduce \sys, a video LLM framework that achieves ultra-FPS streaming video processing (100 fps on a single A100) and enables proactive, always-on responses in real time, without explicit user intervention.

View on arXiv
Comments on this paper