Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
2406.08085
Cited By
v1
v2 (latest)
Flash-VStream: Memory-Based Real-Time Understanding for Long Video Streams
12 June 2024
Haoji Zhang
Yiqin Wang
Yansong Tang
Yong-Jin Liu
Jiashi Feng
Jifeng Dai
Xiaojie Jin
Re-assign community
ArXiv (abs)
PDF
HTML
HuggingFace (17 upvotes)
Github
Papers citing
"Flash-VStream: Memory-Based Real-Time Understanding for Long Video Streams"
50 / 64 papers shown
StreamGaze: Gaze-Guided Temporal Reasoning and Proactive Understanding in Streaming Videos
Daeun Lee
Subhojyoti Mukherjee
Branislav Kveton
Ryan Rossi
Viet Dac Lai
Seunghyun Yoon
Trung Bui
Franck Dernoncourt
Mohit Bansal
RALM
LRM
295
3
0
30 Mar 2026
Can Multi-Modal LLMs Provide Live Step-by-Step Task Guidance?
Apratim Bhattacharyya
Bicheng Xu
Sanjay Haresh
Reza Pourreza
Litian Liu
Sunny Panchal
Pulkit Madan
Leonid Sigal
Roland Memisevic
152
1
0
27 Nov 2025
Vision-Language Memory for Spatial Reasoning
Zuntao Liu
Yi Du
Taimeng Fu
Shaoshu Su
Cherie Ho
Chen Wang
VLM
LRM
359
0
0
25 Nov 2025
Solving Spatial Supersensing Without Spatial Supersensing
Vishaal Udandarao
Shyamgopal Karthik
Surabhi S. Nath
Andreas Hochlehnert
Matthias Bethge
Ameya Prabhu
120
0
0
20 Nov 2025
StreamKV: Streaming Video Question-Answering with Segment-based KV Cache Retrieval and Compression
Yilong Chen
Xiang Bai
Zhibin Wang
Chengyu Bai
Yuhan Dai
Ming Lu
Shanghang Zhang
209
9
0
10 Nov 2025
MVU-Eval: Towards Multi-Video Understanding Evaluation for Multimodal LLMs
Tianhao Peng
Haochen Wang
Yuanxing Zhang
Zekun Wang
Zili Wang
...
Wei Ji
Pengfei Wan
Wenhao Huang
Zhaoxiang Zhang
Jiaheng Liu
ELM
442
4
0
10 Nov 2025
AdaDrive: Self-Adaptive Slow-Fast System for Language-Grounded Autonomous Driving
Ruifei Zhang
Junlin Xie
Wei Emma Zhang
Weikai Chen
Xiao Tan
Xiang Wan
G. Li
184
4
0
09 Nov 2025
LiveStar: Live Streaming Assistant for Real-World Online Video Understanding
Zhenyu Yang
Kairui Zhang
Yuhang Hu
Bing Wang
Shengsheng Qian
Bin Wen
Fan Yang
Tingting Gao
Weiming Dong
Changsheng Xu
OffRL
AI4TS
VLM
311
6
0
07 Nov 2025
Cambrian-S: Towards Spatial Supersensing in Video
Shusheng Yang
J. Yang
Pinzhi Huang
Ellis L Brown
Zihao Yang
...
Daohan Lu
Rob Fergus
Yann LeCun
Li Fei-Fei
Saining Xie
219
43
0
06 Nov 2025
TeleEgo: Benchmarking Egocentric AI Assistants in the Wild
Jiaqi Yan
Ruilong Ren
J. Liu
Shuning Xu
Ling Wang
...
Dell Zhang
Hao Sun
Chi Zhang
Xuelong Li
Xuelong Li
377
3
0
28 Oct 2025
StreamingTOM: Streaming Token Compression for Efficient Video Understanding
Xueyi Chen
Keda Tao
Kele Shao
Huan Wang
374
15
0
21 Oct 2025
MT-Video-Bench: A Holistic Video Understanding Benchmark for Evaluating Multimodal LLMs in Multi-Turn Dialogues
Yaning Pan
Z. Wang
Qianqian Xie
Yongqian Wen
Y. Zhang
...
An Ping
Tianhao Peng
Jiaheng Liu
Tianhao Peng
Jiaheng Liu
235
4
0
20 Oct 2025
Recurrent Attention-based Token Selection for Efficient Streaming Video-LLMs
Vaggelis Dorovatas
Soroush Seifi
Gunshi Gupta
Rahaf Aljundi
153
3
0
20 Oct 2025
video-SALMONN S: Memory-Enhanced Streaming Audio-Visual LLM
Guangzhi Sun
Yixuan Li
Xiaodong Wu
Yudong Yang
Wei Li
Zejun Ma
Chao Zhang
124
1
0
13 Oct 2025
Flow4Agent: Long-form Video Understanding via Motion Prior from Optical Flow
Ruyang Liu
Shangkun Sun
Haoran Tang
Ge Li
Wei-Nan Gao
VGen
VLM
125
5
0
07 Oct 2025
StreamForest: Efficient Online Video Understanding with Persistent Event Memory
Xiangyu Zeng
Kefan Qiu
Qingyu Zhang
Xinhao Li
Jing Wang
...
Kun Tian
Meng Tian
Xinhai Zhao
Yi Wang
Limin Wang
272
22
0
29 Sep 2025
FrameMind: Frame-Interleaved Video Reasoning via Reinforcement Learning
Haonan Ge
Yiwei Wang
Kai-Wei Chang
Hang Wu
Yujun Cai
LRM
283
0
0
28 Sep 2025
Track-On2: Enhancing Online Point Tracking with Memory
Görkay Aydemir
Weidi Xie
Fatma Guney
VOT
3DV
298
3
0
23 Sep 2025
FineQuest: Adaptive Knowledge-Assisted Sports Video Understanding via Agent-of-Thoughts Reasoning
Haodong Chen
Haojian Huang
XinXiang Yin
Dian Shao
LRM
213
3
0
15 Sep 2025
See What You Need: Query-Aware Visual Intelligence through Reasoning-Perception Loops
Zixuan Dong
Baoyun Peng
Y. Wang
Lin Liu
Xinxin Dong
Yunlong Cao
Xiaodong Wang
LRM
139
1
0
25 Aug 2025
StreamMem: Query-Agnostic KV Cache Memory for Streaming Video Understanding
Yanlai Yang
Zhuokai Zhao
Satya Narayan Shukla
Aashu Singh
Shlok Kumar Mishra
Lizhu Zhang
Mengye Ren
VLM
160
23
0
21 Aug 2025
JRDB-Reasoning: A Difficulty-Graded Benchmark for Visual Reasoning in Robotics
Simindokht Jahangard
Mehrzad Mohammadi
Yi Shen
Zhixi Cai
Hamid Rezatofighi
358
2
0
14 Aug 2025
HumanSense: From Multimodal Perception to Empathetic Context-Aware Responses through Reasoning MLLMs
European Workshop on Visual Information Processing (EUVIP), 2025
Zheng Qin
Ruobing Zheng
Yabing Wang
Tianqi Li
Yi Yuan
Jingdong Chen
Le Wang
LRM
375
2
0
14 Aug 2025
Seeing, Listening, Remembering, and Reasoning: A Multimodal Agent with Long-Term Memory
Lin Long
Yexiao He
Wentao Ye
Yiyuan Pan
Yuan Lin
Hang Li
Junbo Zhao
Wei Li
534
34
0
13 Aug 2025
AURA: A Fine-Grained Benchmark and Decomposed Metric for Audio-Visual Reasoning
Siminfar Samakoush Galougah
Rishie Raj
Sanjoy Chowdhury
Sayan Nag
Ramani Duraiswami
258
4
0
10 Aug 2025
Hierarchical Event Memory for Accurate and Low-latency Online Video Temporal Grounding
Minghang Zheng
Yuxin Peng
Benyuan Sun
Yi Yang
Yang Liu
194
0
0
06 Aug 2025
StreamAgent: Towards Anticipatory Agents for Streaming Video Understanding
Haolin Yang
Feilong Tang
Linxiao Zhao
Xiang An
Ming Hu
...
Yifan Lu
Xiaofeng Zhang
Abdalla Swikir
Junjun He
Zongyuan Ge
452
13
0
03 Aug 2025
Scaling RL to Long Videos
Yukang Chen
Wei Huang
Baifeng Shi
Qinghao Hu
Hanrong Ye
...
Xiaojuan Qi
Sifei Liu
Hongxu Yin
Yao Lu
Song Han
OffRL
AI4TS
VLM
LRM
539
64
0
10 Jul 2025
Spatio-Temporal LLM: Reasoning about Environments and Actions
Haozhen Zheng
Beitong Tian
Mingyuan Wu
Zhenggang Tang
Klara Nahrstedt
Alex Schwing
LRM
271
4
0
07 Jul 2025
Stepping Out of Similar Semantic Space for Open-Vocabulary Segmentation
Yong-Jin Liu
SongLi Wu
Sule Bai
Jiahao Wang
Yitong Wang
Yansong Tang
VLM
VOS
370
3
0
19 Jun 2025
MAGNET: A Multi-agent Framework for Finding Audio-Visual Needles by Reasoning over Multi-Video Haystacks
Sanjoy Chowdhury
Mohamed Elmoghany
Yohan Abeysinghe
Mahmoud Ahmed
Sayan Nag
Salman Khan
Mohamed Elhoseiny
Dinesh Manocha
506
7
0
08 Jun 2025
VLA-RL: Towards Masterful and General Robotic Manipulation with Scalable Reinforcement Learning
Guanxing Lu
Wenkai Guo
Chubin Zhang
Yuheng Zhou
Haonan Jiang
Zifeng Gao
Yansong Tang
Ziwei Wang
OffRL
579
95
0
24 May 2025
Temporally-Grounded Language Generation: A Benchmark for Real-Time Vision-Language Models
Keunwoo Peter Yu
Joyce Chai
MLLM
VLM
332
0
0
16 May 2025
StreamBridge: Turning Your Offline Video Large Language Model into a Proactive Streaming Assistant
Haibo Wang
Bo Feng
Zhengfeng Lai
Mingze Xu
Shiyu Li
Weifeng Ge
Afshin Dehghan
Meng Cao
Ping Huang
OffRL
708
17
0
08 May 2025
RTV-Bench: Benchmarking MLLM Continuous Perception, Understanding and Reasoning through Real-Time Video
Shuhang Xun
Sicheng Tao
Jiajun Li
Yibo Shi
Zhixin Lin
...
Shikang Wang
Wenshu Fan
Hao Zhang
Ying Ma
Xuming Hu
VLM
LRM
559
11
0
04 May 2025
FSBench: A Figure Skating Benchmark for Advancing Artistic Sports Understanding
Computer Vision and Pattern Recognition (CVPR), 2025
Rong Gao
Xin Liu
Zhuozhao Hu
Bohao Xing
Baiqiang Xia
Zitong Yu
Heikki Kälviäinen
369
5
0
28 Apr 2025
Learning Streaming Video Representation via Multitask Training
Yibin Yan
Jilan Xu
Shangzhe Di
Yikun Liu
Yudi Shi
Qirui Chen
Zeqian Li
Yifei Huang
Weidi Xie
CLL
551
5
0
28 Apr 2025
TimeChat-Online: 80% Visual Tokens are Naturally Redundant in Streaming Videos
Linli Yao
You Li
Y. X. Wei
Lei Li
Shuhuai Ren
...
Sida Li
Dianbo Sui
Qi Liu
Yanzhe Zhang
Xu Sun
334
48
0
24 Apr 2025
IV-Bench: A Benchmark for Image-Grounded Video Perception and Reasoning in Multimodal LLMs
David Ma
Yanzhe Zhang
J. Ren
Jarvis Guo
Yifan Yao
...
Shiwen Ni
Jing Liu
Wenhao Huang
Ge Zhang
Xiaojie Jin
VLM
355
4
0
21 Apr 2025
TimeSearch: Hierarchical Video Search with Spotlight and Reflection for Human-like Long Video Understanding
Junwen Pan
Rui Zhang
Xin Wan
Yuan Zhang
Ming Lu
Qi She
VLM
398
4
0
02 Apr 2025
SlowFast-LLaVA-1.5: A Family of Token-Efficient Video Large Language Models for Long-Form Video Understanding
Mingze Xu
Mingfei Gao
Shiyu Li
Jiasen Lu
Zhe Gan
Zhengfeng Lai
Meng Cao
Kai Kang
Yue Yang
Afshin Dehghan
483
23
0
24 Mar 2025
Hybrid-Level Instruction Injection for Video Token Compression in Multi-modal Large Language Models
Computer Vision and Pattern Recognition (CVPR), 2025
Zhihang Liu
Chen-Wei Xie
Nianzu Yang
Liming Zhao
Longxiang Tang
Yun Zheng
Chuanbin Liu
Hongtao Xie
VLM
297
18
0
20 Mar 2025
M3: 3D-Spatial MultiModal Memory
International Conference on Learning Representations (ICLR), 2025
Xueyan Zou
Yuchen Song
Ri-Zhao Qiu
Xuanbin Peng
Jianglong Ye
Sifei Liu
Xiaolong Wang
3DGS
374
2
0
20 Mar 2025
ViSpeak: Visual Instruction Feedback in Streaming Videos
Shenghao Fu
Q. Yang
Yuan-Ming Li
Yi-Xing Peng
Kun-Yu Lin
Xihan Wei
Jian-Fang Hu
Xiaohua Xie
Wei-Shi Zheng
VLM
385
21
0
17 Mar 2025
VideoScan: Enabling Efficient Streaming Video Understanding via Frame-level Semantic Carriers
Ruanjun Li
Yuedong Tan
Yuanming Shi
Jiawei Shao
VLM
857
6
0
12 Mar 2025
HierarQ: Task-Aware Hierarchical Q-Former for Enhanced Video Understanding
Computer Vision and Pattern Recognition (CVPR), 2025
Shehreen Azad
Vibhav Vineet
Yogesh S Rawat
VLM
1.1K
15
0
11 Mar 2025
StreamMind: Unlocking Full Frame Rate Streaming Video Dialogue through Event-Gated Cognition
Xin Ding
Hao Wu
Yue Yang
Shiqi Jiang
Donglin Bai
Zhibo Chen
Ting Cao
985
17
0
08 Mar 2025
LION-FS: Fast & Slow Video-Language Thinker as Online Video Assistant
Computer Vision and Pattern Recognition (CVPR), 2025
Wei Li
Bing Hu
Rui Shao
Leyang Shen
Liqiang Nie
365
45
0
05 Mar 2025
Streaming Video Question-Answering with In-context Video KV-Cache Retrieval
International Conference on Learning Representations (ICLR), 2025
Shangzhe Di
Zhelun Yu
Guanghao Zhang
Haoyuan Li
Tao Zhong
Hao Cheng
Bolin Li
Wanggui He
Fangxun Shu
Hao Jiang
277
65
0
01 Mar 2025
SVBench: A Benchmark with Temporal Multi-Turn Dialogues for Streaming Video Understanding
International Conference on Learning Representations (ICLR), 2025
Zhenyu Yang
Yihan Hu
Zemin Du
Dizhan Xue
Chuanrui Hu
Jiahong Wu
Fan Yang
Weiming Dong
Changsheng Xu
426
36
0
15 Feb 2025
1
2
Next
Page 1 of 2