ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2406.08085
  4. Cited By
Flash-VStream: Memory-Based Real-Time Understanding for Long Video
  Streams

Flash-VStream: Memory-Based Real-Time Understanding for Long Video Streams

12 June 2024
Haoji Zhang
Yiqin Wang
Yansong Tang
Yong-Jin Liu
Jiashi Feng
Jifeng Dai
Xiaojie Jin
ArXivPDFHTML

Papers citing "Flash-VStream: Memory-Based Real-Time Understanding for Long Video Streams"

17 / 17 papers shown
Title
StreamBridge: Turning Your Offline Video Large Language Model into a Proactive Streaming Assistant
StreamBridge: Turning Your Offline Video Large Language Model into a Proactive Streaming Assistant
Haibo Wang
Bo Feng
Zhengfeng Lai
Mingze Xu
Shiyu Li
Weifeng Ge
Afshin Dehghan
Meng Cao
Ping-Chia Huang
OffRL
42
3
0
08 May 2025
RTV-Bench: Benchmarking MLLM Continuous Perception, Understanding and Reasoning through Real-Time Video
RTV-Bench: Benchmarking MLLM Continuous Perception, Understanding and Reasoning through Real-Time Video
Shuhang Xun
Sicheng Tao
J. Li
Yibo Shi
Zhixin Lin
...
Shikang Wang
Y. Liu
H. Zhang
Ying Ma
Xuming Hu
VLM
LRM
41
0
0
04 May 2025
FSBench: A Figure Skating Benchmark for Advancing Artistic Sports Understanding
FSBench: A Figure Skating Benchmark for Advancing Artistic Sports Understanding
Rong Gao
Xin Liu
Zhuozhao Hu
Bohao Xing
Baiqiang Xia
Zitong Yu
H. Kalviainen
41
0
0
28 Apr 2025
Learning Streaming Video Representation via Multitask Training
Learning Streaming Video Representation via Multitask Training
Yibin Yan
Jilan Xu
Shangzhe Di
Yikun Liu
Yudi Shi
Qirui Chen
Zeqian Li
Yifei Huang
Weidi Xie
CLL
76
0
0
28 Apr 2025
VideoScan: Enabling Efficient Streaming Video Understanding via Frame-level Semantic Carriers
VideoScan: Enabling Efficient Streaming Video Understanding via Frame-level Semantic Carriers
Ruanjun Li
Yuedong Tan
Yuanming Shi
Jiawei Shao
VLM
70
0
0
12 Mar 2025
HierarQ: Task-Aware Hierarchical Q-Former for Enhanced Video Understanding
HierarQ: Task-Aware Hierarchical Q-Former for Enhanced Video Understanding
Shehreen Azad
Vibhav Vineet
Y. S. Rawat
VLM
52
1
0
11 Mar 2025
StreamMind: Unlocking Full Frame Rate Streaming Video Dialogue through Event-Gated Cognition
StreamMind: Unlocking Full Frame Rate Streaming Video Dialogue through Event-Gated Cognition
Xin Ding
Hao Wu
Y. Yang
Shiqi Jiang
Donglin Bai
Zhibo Chen
Ting Cao
40
0
0
08 Mar 2025
Streaming Video Question-Answering with In-context Video KV-Cache Retrieval
Shangzhe Di
Zhelun Yu
Guanghao Zhang
Haoyuan Li
Tao Zhong
Hao Cheng
Bolin Li
Wanggui He
Fangxun Shu
Hao Jiang
53
4
0
01 Mar 2025
TimeSuite: Improving MLLMs for Long Video Understanding via Grounded Tuning
TimeSuite: Improving MLLMs for Long Video Understanding via Grounded Tuning
Xiangyu Zeng
Kunchang Li
Chenting Wang
Xinhao Li
Tianxiang Jiang
...
Zhengrong Yue
Yi Wang
Yali Wang
Yu Qiao
Limin Wang
MLLM
VLM
AI4TS
55
14
0
25 Oct 2024
Universal Segmentation at Arbitrary Granularity with Language
  Instruction
Universal Segmentation at Arbitrary Granularity with Language Instruction
Yong Liu
Cairong Zhang
Yitong Wang
Jiahao Wang
Yujiu Yang
Yansong Tang
VLM
VOS
41
5
0
04 Dec 2023
Video-LLaVA: Learning United Visual Representation by Alignment Before
  Projection
Video-LLaVA: Learning United Visual Representation by Alignment Before Projection
Bin Lin
Yang Ye
Bin Zhu
Jiaxi Cui
Munan Ning
Peng Jin
Li-ming Yuan
VLM
MLLM
185
576
0
16 Nov 2023
BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image
  Encoders and Large Language Models
BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models
Junnan Li
Dongxu Li
Silvio Savarese
Steven C. H. Hoi
VLM
MLLM
244
4,186
0
30 Jan 2023
Training language models to follow instructions with human feedback
Training language models to follow instructions with human feedback
Long Ouyang
Jeff Wu
Xu Jiang
Diogo Almeida
Carroll L. Wainwright
...
Amanda Askell
Peter Welinder
Paul Christiano
Jan Leike
Ryan J. Lowe
OSLM
ALM
301
11,730
0
04 Mar 2022
BLIP: Bootstrapping Language-Image Pre-training for Unified
  Vision-Language Understanding and Generation
BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation
Junnan Li
Dongxu Li
Caiming Xiong
S. Hoi
MLLM
BDL
VLM
CLIP
380
4,010
0
28 Jan 2022
Ego4D: Around the World in 3,000 Hours of Egocentric Video
Ego4D: Around the World in 3,000 Hours of Egocentric Video
Kristen Grauman
Andrew Westbury
Eugene Byrne
Zachary Chavis
Antonino Furnari
...
Mike Zheng Shou
Antonio Torralba
Lorenzo Torresani
Mingfei Yan
Jitendra Malik
EgoV
218
682
0
13 Oct 2021
Dynamic Memory based Attention Network for Sequential Recommendation
Dynamic Memory based Attention Network for Sequential Recommendation
Qiaoyu Tan
Jianwei Zhang
Ninghao Liu
Xiao Shi Huang
Hongxia Yang
Jingren Zhou
Xia Hu
HAI
85
59
0
18 Feb 2021
Towards Real-Time Multi-Object Tracking
Towards Real-Time Multi-Object Tracking
Zhongdao Wang
Liang Zheng
Yixuan Liu
Yali Li
Shengjin Wang
VOT
232
844
0
27 Sep 2019
1