v1v2 (latest)

Flash-VStream: Memory-Based Real-Time Understanding for Long Video Streams

12 June 2024

Haoji Zhang

Yiqin Wang

Yansong Tang

ArXiv (abs)PDF HTML HuggingFace (17 upvotes)Github

Papers citing "Flash-VStream: Memory-Based Real-Time Understanding for Long Video Streams"

50 / 64 papers shown

StreamGaze: Gaze-Guided Temporal Reasoning and Proactive Understanding in Streaming Videos

295

30 Mar 2026

Can Multi-Modal LLMs Provide Live Step-by-Step Task Guidance?

Apratim Bhattacharyya

152

27 Nov 2025

Vision-Language Memory for Spatial Reasoning

359

25 Nov 2025

Solving Spatial Supersensing Without Spatial Supersensing

120

20 Nov 2025

StreamKV: Streaming Video Question-Answering with Segment-based KV Cache Retrieval and Compression

209

10 Nov 2025

MVU-Eval: Towards Multi-Video Understanding Evaluation for Multimodal LLMs

...

442

10 Nov 2025

AdaDrive: Self-Adaptive Slow-Fast System for Language-Grounded Autonomous Driving

184

09 Nov 2025

LiveStar: Live Streaming Assistant for Real-World Online Video Understanding

311

07 Nov 2025

Cambrian-S: Towards Spatial Supersensing in Video

...

219

06 Nov 2025

TeleEgo: Benchmarking Egocentric AI Assistants in the Wild

...

377

28 Oct 2025

StreamingTOM: Streaming Token Compression for Efficient Video Understanding

374

21 Oct 2025

MT-Video-Bench: A Holistic Video Understanding Benchmark for Evaluating Multimodal LLMs in Multi-Turn Dialogues

...

Tianhao Peng

Jiaheng Liu

235

20 Oct 2025

Recurrent Attention-based Token Selection for Efficient Streaming Video-LLMs

153

20 Oct 2025

video-SALMONN S: Memory-Enhanced Streaming Audio-Visual LLM

124

13 Oct 2025

Flow4Agent: Long-form Video Understanding via Motion Prior from Optical Flow

125

07 Oct 2025

StreamForest: Efficient Online Video Understanding with Persistent Event Memory

...

272

29 Sep 2025

FrameMind: Frame-Interleaved Video Reasoning via Reinforcement Learning

283

28 Sep 2025

Track-On2: Enhancing Online Point Tracking with Memory

298

23 Sep 2025

FineQuest: Adaptive Knowledge-Assisted Sports Video Understanding via Agent-of-Thoughts Reasoning

213

15 Sep 2025

See What You Need: Query-Aware Visual Intelligence through Reasoning-Perception Loops

139

25 Aug 2025

StreamMem: Query-Agnostic KV Cache Memory for Streaming Video Understanding

160

21 Aug 2025

JRDB-Reasoning: A Difficulty-Graded Benchmark for Visual Reasoning in Robotics

358

14 Aug 2025

HumanSense: From Multimodal Perception to Empathetic Context-Aware Responses through Reasoning MLLMsEuropean Workshop on Visual Information Processing (EUVIP), 2025

375

14 Aug 2025

Seeing, Listening, Remembering, and Reasoning: A Multimodal Agent with Long-Term Memory

534

13 Aug 2025

AURA: A Fine-Grained Benchmark and Decomposed Metric for Audio-Visual Reasoning

Siminfar Samakoush Galougah

258

10 Aug 2025

Hierarchical Event Memory for Accurate and Low-latency Online Video Temporal Grounding

194

06 Aug 2025

StreamAgent: Towards Anticipatory Agents for Streaming Video Understanding

...

452

03 Aug 2025

Scaling RL to Long Videos

...

539

10 Jul 2025

Spatio-Temporal LLM: Reasoning about Environments and Actions

271

07 Jul 2025

Stepping Out of Similar Semantic Space for Open-Vocabulary Segmentation

370

19 Jun 2025

MAGNET: A Multi-agent Framework for Finding Audio-Visual Needles by Reasoning over Multi-Video Haystacks

506

08 Jun 2025

VLA-RL: Towards Masterful and General Robotic Manipulation with Scalable Reinforcement Learning

579

24 May 2025

Temporally-Grounded Language Generation: A Benchmark for Real-Time Vision-Language Models

Keunwoo Peter Yu

Joyce Chai

MLLM VLM

332

16 May 2025

StreamBridge: Turning Your Offline Video Large Language Model into a Proactive Streaming Assistant

708

08 May 2025

RTV-Bench: Benchmarking MLLM Continuous Perception, Understanding and Reasoning through Real-Time Video

...

559

04 May 2025

FSBench: A Figure Skating Benchmark for Advancing Artistic Sports UnderstandingComputer Vision and Pattern Recognition (CVPR), 2025

369

28 Apr 2025

Learning Streaming Video Representation via Multitask Training

551

28 Apr 2025

TimeChat-Online: 80% Visual Tokens are Naturally Redundant in Streaming Videos

...

334

24 Apr 2025

IV-Bench: A Benchmark for Image-Grounded Video Perception and Reasoning in Multimodal LLMs

...

355

21 Apr 2025

TimeSearch: Hierarchical Video Search with Spotlight and Reflection for Human-like Long Video Understanding

398

02 Apr 2025

SlowFast-LLaVA-1.5: A Family of Token-Efficient Video Large Language Models for Long-Form Video Understanding

483

24 Mar 2025

Hybrid-Level Instruction Injection for Video Token Compression in Multi-modal Large Language ModelsComputer Vision and Pattern Recognition (CVPR), 2025

297

20 Mar 2025

M3: 3D-Spatial MultiModal MemoryInternational Conference on Learning Representations (ICLR), 2025

374

20 Mar 2025

ViSpeak: Visual Instruction Feedback in Streaming Videos

385

17 Mar 2025

VideoScan: Enabling Efficient Streaming Video Understanding via Frame-level Semantic Carriers

857

12 Mar 2025

HierarQ: Task-Aware Hierarchical Q-Former for Enhanced Video UnderstandingComputer Vision and Pattern Recognition (CVPR), 2025

1.1K

11 Mar 2025

StreamMind: Unlocking Full Frame Rate Streaming Video Dialogue through Event-Gated Cognition

985

08 Mar 2025

LION-FS: Fast & Slow Video-Language Thinker as Online Video AssistantComputer Vision and Pattern Recognition (CVPR), 2025

365

05 Mar 2025

Streaming Video Question-Answering with In-context Video KV-Cache RetrievalInternational Conference on Learning Representations (ICLR), 2025

277

01 Mar 2025

SVBench: A Benchmark with Temporal Multi-Turn Dialogues for Streaming Video UnderstandingInternational Conference on Learning Representations (ICLR), 2025

426

15 Feb 2025