ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2404.17176
  4. Cited By
MovieChat+: Question-aware Sparse Memory for Long Video Question
  Answering

MovieChat+: Question-aware Sparse Memory for Long Video Question Answering

26 April 2024
Enxin Song
Wenhao Chai
Tianbo Ye
Lei Li
Xi Li
Gaoang Wang
    VLMMLLM
ArXiv (abs)PDFHTMLGithub (621★)

Papers citing "MovieChat+: Question-aware Sparse Memory for Long Video Question Answering"

37 / 37 papers shown
Title
LiveStar: Live Streaming Assistant for Real-World Online Video Understanding
LiveStar: Live Streaming Assistant for Real-World Online Video Understanding
Zhenyu Yang
Kairui Zhang
Yuhang Hu
Bing Wang
Shengsheng Qian
Bin Wen
Fan Yang
Tingting Gao
Weiming Dong
Changsheng Xu
OffRLAI4TSVLM
176
0
0
07 Nov 2025
StreamingTOM: Streaming Token Compression for Efficient Video Understanding
StreamingTOM: Streaming Token Compression for Efficient Video Understanding
Xueyi Chen
Keda Tao
Kele Shao
Huan Wang
129
1
0
21 Oct 2025
A Survey on Agentic Multimodal Large Language Models
A Survey on Agentic Multimodal Large Language Models
Huanjin Yao
Ruifei Zhang
Jiaxing Huang
Jingyi Zhang
Yibo Wang
...
Ruolin Zhu
Yongcheng Jing
Shunyu Liu
Guanbin Li
Dacheng Tao
LM&RoAIFinAI4TSLRMAI4CE
185
4
0
13 Oct 2025
Video-LMM Post-Training: A Deep Dive into Video Reasoning with Large Multimodal Models
Video-LMM Post-Training: A Deep Dive into Video Reasoning with Large Multimodal Models
Yunlong Tang
Jing Bi
Pinxin Liu
Zhenyu Pan
Mingqian Feng
...
Zeliang Zhang
Daiki Shimada
Han Liu
Jiebo Luo
Chenliang Xu
MLLMOffRLVLMLRM
502
7
0
06 Oct 2025
VideoNSA: Native Sparse Attention Scales Video Understanding
VideoNSA: Native Sparse Attention Scales Video Understanding
Enxin Song
Wenhao Chai
Shusheng Yang
Ethan Armand
Xiaojun Shan
Haiyang Xu
Jianwen Xie
Zhuowen Tu
100
1
0
02 Oct 2025
Dense Video Understanding with Gated Residual Tokenization
Dense Video Understanding with Gated Residual Tokenization
Haichao Zhang
Wenhao Chai
Shwai He
Ang Li
Yun Fu
VGen
98
0
0
17 Sep 2025
AdsQA: Towards Advertisement Video Understanding
AdsQA: Towards Advertisement Video Understanding
Xinwei Long
Kai Tian
Peng Xu
Guoli Jia
Jingxuan Li
...
Che Jiang
Hao Xu
Yang Liu
Jiaheng Ma
Bowen Zhou
84
2
0
10 Sep 2025
StreamMem: Query-Agnostic KV Cache Memory for Streaming Video Understanding
StreamMem: Query-Agnostic KV Cache Memory for Streaming Video Understanding
Yanlai Yang
Zhuokai Zhao
Satya Narayan Shukla
Aashu Singh
Shlok Kumar Mishra
Lizhu Zhang
Mengye Ren
VLM
84
5
0
21 Aug 2025
InterAct-Video: Reasoning-Rich Video QA for Urban Traffic
InterAct-Video: Reasoning-Rich Video QA for Urban Traffic
Joseph Raj Vishal
Rutuja Patil
Manas Srinivas Gowda
Katha Naik
Yezhou Yang
Bharatesh Chakravarthi
Bharatesh Chakravarthi
118
0
0
19 Jul 2025
AuroraLong: Bringing RNNs Back to Efficient Open-Ended Video Understanding
AuroraLong: Bringing RNNs Back to Efficient Open-Ended Video Understanding
Weili Xu
Enxin Song
Wenhao Chai
Xuexiang Wen
Tian-Chun Ye
Gaoang Wang
256
3
0
03 Jul 2025
Movie Facts and Fibs (MF$^2$): A Benchmark for Long Movie Understanding
Movie Facts and Fibs (MF2^22): A Benchmark for Long Movie Understanding
Emmanouil Zaranis
António Farinhas
Saul Santos
Beatriz Canaverde
Miguel Moura Ramos
...
Raffaella Bernardi
Raquel Fernández
Sandro Pezzelle
Vlad Niculae
Andre F. T. Martins
199
3
0
06 Jun 2025
CrossLMM: Decoupling Long Video Sequences from LMMs via Dual Cross-Attention Mechanisms
CrossLMM: Decoupling Long Video Sequences from LMMs via Dual Cross-Attention Mechanisms
Shilin Yan
Jiaming Han
Joey Tsai
Hongwei Xue
Rongyao Fang
Lingyi Hong
Ziyu Guo
Ray Zhang
VLM
225
5
0
22 May 2025
RAVU: Retrieval Augmented Video Understanding with Compositional Reasoning over Graph
RAVU: Retrieval Augmented Video Understanding with Compositional Reasoning over Graph
Sameer Malik
Moyuru Yamada
Ayush Singh
Dishank Aggarwal
929
1
0
06 May 2025
TEMPURA: Temporal Event Masked Prediction and Understanding for Reasoning in Action
TEMPURA: Temporal Event Masked Prediction and Understanding for Reasoning in Action
Jen-Hao Cheng
Vivian Wang
Huayu Wang
Huapeng Zhou
Yi-Hao Peng
...
Wenhao Chai
Yi-Ling Chen
Vibhav Vineet
Qin Cai
Lei Li
AI4TS
684
8
0
02 May 2025
Video-MMLU: A Massive Multi-Discipline Lecture Understanding Benchmark
Video-MMLU: A Massive Multi-Discipline Lecture Understanding Benchmark
Enxin Song
Wenhao Chai
Weili Xu
Jianwen Xie
Yuxuan Liu
Gaoang Wang
329
19
0
20 Apr 2025
HierarQ: Task-Aware Hierarchical Q-Former for Enhanced Video Understanding
HierarQ: Task-Aware Hierarchical Q-Former for Enhanced Video UnderstandingComputer Vision and Pattern Recognition (CVPR), 2025
Shehreen Azad
Vibhav Vineet
Yogesh S Rawat
VLM
975
11
0
11 Mar 2025
Question-Aware Gaussian Experts for Audio-Visual Question AnsweringComputer Vision and Pattern Recognition (CVPR), 2025
Hongyeob Kim
Inyoung Jung
Dayoon Suh
Youjia Zhang
Sangmin Lee
Sungeun Hong
312
5
0
06 Mar 2025
DiffPO: Diffusion-styled Preference Optimization for Efficient Inference-Time Alignment of Large Language Models
DiffPO: Diffusion-styled Preference Optimization for Efficient Inference-Time Alignment of Large Language Models
Ruizhe Chen
Wenhao Chai
Zhifei Yang
Xiaotian Zhang
Qiufeng Wang
Tony Q.S. Quek
Soujanya Poria
Zuozhu Liu
413
3
0
06 Mar 2025
LION-FS: Fast & Slow Video-Language Thinker as Online Video AssistantComputer Vision and Pattern Recognition (CVPR), 2025
Wei Li
Bing Hu
Rui Shao
Leyang Shen
Liqiang Nie
223
30
0
05 Mar 2025
InternLM-XComposer2.5-OmniLive: A Comprehensive Multimodal System for
  Long-term Streaming Video and Audio Interactions
InternLM-XComposer2.5-OmniLive: A Comprehensive Multimodal System for Long-term Streaming Video and Audio Interactions
Pan Zhang
Xiaoyi Dong
Yuhang Cao
Yuhang Zang
Rui Qian
...
Xinsong Zhang
Kai Chen
Yu Qiao
Dahua Lin
Jiaqi Wang
KELM
338
31
0
12 Dec 2024
SEAL: Semantic Attention Learning for Long Video Representation
SEAL: Semantic Attention Learning for Long Video RepresentationComputer Vision and Pattern Recognition (CVPR), 2024
Lan Wang
Yujia Chen
Wen-Sheng Chu
Vishnu Boddeti
Du Tran
VLM
438
7
0
02 Dec 2024
VideoSAVi: Self-Aligned Video Language Models without Human Supervision
VideoSAVi: Self-Aligned Video Language Models without Human Supervision
Yogesh Kulkarni
Pooyan Fazli
VLM
510
5
0
01 Dec 2024
SAMURAI: Adapting Segment Anything Model for Zero-Shot Visual Tracking
  with Motion-Aware Memory
SAMURAI: Adapting Segment Anything Model for Zero-Shot Visual Tracking with Motion-Aware Memory
Cheng-Yen Yang
Hsiang-Wei Huang
Wenhao Chai
Zhongyu Jiang
Lei Li
VLM
289
69
0
18 Nov 2024
TS-LLaVA: Constructing Visual Tokens through Thumbnail-and-Sampling for Training-Free Video Large Language Models
Tingyu Qu
Mingxiao Li
Tinne Tuytelaars
Marie-Francine Moens
VLM
224
3
0
17 Nov 2024
TimeSuite: Improving MLLMs for Long Video Understanding via Grounded Tuning
TimeSuite: Improving MLLMs for Long Video Understanding via Grounded TuningInternational Conference on Learning Representations (ICLR), 2024
Xiangyu Zeng
Kunchang Li
Chenting Wang
Xinhao Li
Tianxiang Jiang
...
Zhengrong Yue
Yi Wang
Yali Wang
Yu Qiao
Limin Wang
MLLMVLMAI4TS
235
52
0
25 Oct 2024
TRACE: Temporal Grounding Video LLM via Causal Event Modeling
TRACE: Temporal Grounding Video LLM via Causal Event ModelingInternational Conference on Learning Representations (ICLR), 2024
Yongxin Guo
Jingyu Liu
Mingda Li
Xiaoying Tang
Qingbin Liu
Xiaoying Tang
221
43
0
08 Oct 2024
AuroraCap: Efficient, Performant Video Detailed Captioning and a New Benchmark
AuroraCap: Efficient, Performant Video Detailed Captioning and a New BenchmarkInternational Conference on Learning Representations (ICLR), 2024
Wenhao Chai
Enxin Song
Y. Du
Chenlin Meng
Vashisht Madhavan
Omer Bar-Tal
Jeng-Neng Hwang
Saining Xie
Christopher D. Manning
3DV
533
87
0
04 Oct 2024
Optimization Proxies using Limited Labeled Data and Training Time -- A Semi-Supervised Bayesian Neural Network Approach
Optimization Proxies using Limited Labeled Data and Training Time -- A Semi-Supervised Bayesian Neural Network Approach
Parikshit Pareek
Abhijith Jayakumar
K. Sundar
Sidhant Misra
Sidhant Misra
405
19
0
04 Oct 2024
MM1.5: Methods, Analysis & Insights from Multimodal LLM Fine-tuning
MM1.5: Methods, Analysis & Insights from Multimodal LLM Fine-tuning
Haotian Zhang
Mingfei Gao
Zhe Gan
Philipp Dufter
Nina Wenzel
...
Haoxuan You
Zirui Wang
Afshin Dehghan
Peter Grasch
Yinfei Yang
VLMMLLM
263
64
1
30 Sep 2024
Enhancing Long Video Understanding via Hierarchical Event-Based Memory
Enhancing Long Video Understanding via Hierarchical Event-Based Memory
Dingxin Cheng
Mingda Li
Jingyu Liu
Yongxin Guo
Bin Jiang
Qingbin Liu
Xi Chen
Bo Zhao
195
12
0
10 Sep 2024
SlowFast-LLaVA: A Strong Training-Free Baseline for Video Large Language
  Models
SlowFast-LLaVA: A Strong Training-Free Baseline for Video Large Language Models
Mingze Xu
Mingfei Gao
Zhe Gan
Hong-You Chen
Zhengfeng Lai
Haiming Gang
Kai Kang
Afshin Dehghan
213
90
0
22 Jul 2024
InternLM-XComposer-2.5: A Versatile Large Vision Language Model
  Supporting Long-Contextual Input and Output
InternLM-XComposer-2.5: A Versatile Large Vision Language Model Supporting Long-Contextual Input and Output
Pan Zhang
Xiaoyi Dong
Yuhang Zang
Yuhang Cao
Rui Qian
...
Kai Chen
Jifeng Dai
Yu Qiao
Dahua Lin
Jiaqi Wang
228
167
0
03 Jul 2024
Through the Theory of Mind's Eye: Reading Minds with Multimodal Video Large Language Models
Through the Theory of Mind's Eye: Reading Minds with Multimodal Video Large Language Models
Z. Chen
Tianchun Wang
Yizhou Wang
Michal Kosinski
Xiang Zhang
Yun Fu
Sheng Li
LRM
202
6
0
19 Jun 2024
DrVideo: Document Retrieval Based Long Video Understanding
DrVideo: Document Retrieval Based Long Video UnderstandingComputer Vision and Pattern Recognition (CVPR), 2024
Ziyu Ma
Chenhui Gou
Hengcan Shi
Bin Sun
Shutao Li
Hamid Rezatofighi
Jianfei Cai
VLM
182
35
0
18 Jun 2024
CityCraft: A Real Crafter for 3D City Generation
CityCraft: A Real Crafter for 3D City Generation
Jie Deng
Wenhao Chai
Junsheng Huang
Zhonghan Zhao
Qixuan Huang
...
Shengyu Hao
Wenhao Hu
Lei Li
X. Li
Gaoang Wang
155
24
0
07 Jun 2024
Streaming Long Video Understanding with Large Language Models
Streaming Long Video Understanding with Large Language Models
Rui Qian
Xiao-wen Dong
Pan Zhang
Yuhang Zang
Shuangrui Ding
Dahua Lin
Yuan Liu
VLM
215
105
0
25 May 2024
See and Think: Embodied Agent in Virtual Environment
See and Think: Embodied Agent in Virtual EnvironmentEuropean Conference on Computer Vision (ECCV), 2023
Zhonghan Zhao
Wenhao Chai
Xuan Wang
Li Boyi
Shengyu Hao
Shidong Cao
Tianbo Ye
Gaoang Wang
LM&RoLLMAG
316
50
0
26 Nov 2023
1