ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2305.06355
  4. Cited By
VideoChat: Chat-Centric Video Understanding
v1v2 (latest)

VideoChat: Chat-Centric Video Understanding

10 May 2023
Kunchang Li
Yinan He
Yi Wang
Yizhuo Li
Wen Wang
Ping Luo
Yali Wang
Limin Wang
Yu Qiao
    MLLM
ArXiv (abs)PDFHTMLHuggingFace (3 upvotes)Github (3246★)

Papers citing "VideoChat: Chat-Centric Video Understanding"

50 / 563 papers shown
SF2T: Self-supervised Fragment Finetuning of Video-LLMs for Fine-Grained Understanding
SF2T: Self-supervised Fragment Finetuning of Video-LLMs for Fine-Grained UnderstandingComputer Vision and Pattern Recognition (CVPR), 2025
Yangliu Hu
Zikai Song
Na Feng
Yawei Luo
Junqing Yu
Yi-Ping Phoebe Chen
Wei Yang
200
11
0
10 Apr 2025
How Can Objects Help Video-Language Understanding?
How Can Objects Help Video-Language Understanding?
Zitian Tang
Shijie Wang
Junho Cho
Jaewook Yoo
Chen Sun
356
3
0
10 Apr 2025
VideoChat-R1: Enhancing Spatio-Temporal Perception via Reinforcement Fine-Tuning
VideoChat-R1: Enhancing Spatio-Temporal Perception via Reinforcement Fine-Tuning
Xinhao Li
Ziang Yan
Desen Meng
Yi Liu
Xiangyu Zeng
Yinan He
Yun Wang
Yu Qiao
Yi Wang
Limin Wang
VLMAI4TSLRM
819
123
0
09 Apr 2025
PaMi-VDPO: Mitigating Video Hallucinations by Prompt-Aware Multi-Instance Video Preference Learning
PaMi-VDPO: Mitigating Video Hallucinations by Prompt-Aware Multi-Instance Video Preference Learning
Xinpeng Ding
Jianchao Tan
Jinahua Han
Lanqing Hong
Hang Xu
Xuelong Li
MLLMVLM
1.1K
6
0
08 Apr 2025
REVEAL: Relation-based Video Representation Learning for Video-Question-Answering
REVEAL: Relation-based Video Representation Learning for Video-Question-Answering
Sofian Chaybouti
Walid Bousselham
Moritz Wolter
Hilde Kuehne
888
0
0
07 Apr 2025
REEF: Relevance-Aware and Efficient LLM Adapter for Video Understanding
REEF: Relevance-Aware and Efficient LLM Adapter for Video Understanding
Sakib Reza
Xiyun Song
Heather Yu
Zongfang Lin
Mohsen Moghaddam
Mario Sznaier
277
0
0
07 Apr 2025
Caption Anything in Video: Fine-grained Object-centric Captioning via Spatiotemporal Multimodal Prompting
Caption Anything in Video: Fine-grained Object-centric Captioning via Spatiotemporal Multimodal Prompting
Yunlong Tang
Jing Bi
Chao Huang
Susan Liang
Daiki Shimada
...
Jinxi He
Liu He
Zeliang Zhang
Jiebo Luo
Chenliang Xu
272
8
0
07 Apr 2025
AVadCLIP: Audio-Visual Collaboration for Robust Video Anomaly Detection
AVadCLIP: Audio-Visual Collaboration for Robust Video Anomaly Detection
Peng Wu
Wanshun Su
Guansong Pang
Yujia Sun
Qingsen Yan
Peng Wang
Yujiao Shi
VLM
315
5
0
06 Apr 2025
Window Token Concatenation for Efficient Visual Large Language Models
Window Token Concatenation for Efficient Visual Large Language Models
Jiayi Zhang
Wentao Bao
Botao Ye
Zhen Tan
Tianlong Chen
Huan Liu
Yu Kong
VLM
277
1
0
05 Apr 2025
Enhancing Traffic Incident Response through Sub-Second Temporal Localization with HybridMamba
Enhancing Traffic Incident Response through Sub-Second Temporal Localization with HybridMamba
Ibne Farabi Shihab
Sanjeda Akter
Anuj Sharma
Mamba
397
2
0
04 Apr 2025
Scaling Video-Language Models to 10K Frames via Hierarchical Differential Distillation
Scaling Video-Language Models to 10K Frames via Hierarchical Differential Distillation
Chuanqi Cheng
Jian Guan
Wei Wu
Rui Yan
VLM
681
18
0
03 Apr 2025
Aligned Better, Listen Better for Audio-Visual Large Language Models
Aligned Better, Listen Better for Audio-Visual Large Language ModelsInternational Conference on Learning Representations (ICLR), 2025
Yuxin Guo
Shuailei Ma
Shijie Ma
Xiaoyi Bao
Chen-Wei Xie
Kecheng Zheng
Tingyu Weng
Siyang Sun
Yun Zheng
Wei Zou
MLLMAuLLM
324
8
0
02 Apr 2025
Slow-Fast Architecture for Video Multi-Modal Large Language Models
Slow-Fast Architecture for Video Multi-Modal Large Language Models
Min Shi
Shihao Wang
Chieh-Yun Chen
Jitesh Jain
Kai Wang
Junjun Xiong
Guilin Liu
Zhiding Yu
Humphrey Shi
235
7
0
02 Apr 2025
TimeSearch: Hierarchical Video Search with Spotlight and Reflection for Human-like Long Video Understanding
TimeSearch: Hierarchical Video Search with Spotlight and Reflection for Human-like Long Video Understanding
Junwen Pan
Rui Zhang
Xin Wan
Yuan Zhang
Ming Lu
Qi She
VLM
313
4
0
02 Apr 2025
STI-Bench: Are MLLMs Ready for Precise Spatial-Temporal World Understanding?
STI-Bench: Are MLLMs Ready for Precise Spatial-Temporal World Understanding?
Yongbin Li
Yujiao Shi
Tao Lin
Xiangrui Liu
Wenxiao Cai
Zhengyang Liang
Bo Zhao
LRM
624
36
0
31 Mar 2025
BOLT: Boost Large Vision-Language Model Without Training for Long-form Video Understanding
BOLT: Boost Large Vision-Language Model Without Training for Long-form Video UnderstandingComputer Vision and Pattern Recognition (CVPR), 2025
Shuming Liu
Chen Zhao
Tianqi Xu
Bernard Ghanem
VLM
296
26
0
27 Mar 2025
Vision-to-Music Generation: A Survey
Vision-to-Music Generation: A Survey
Zhaokai Wang
Chenxi Bao
Le Zhuo
Jingrui Han
Yang Yue
Yihong Tang
Victor Shea-Jay Huang
Yue Liao
EGVMVGen
348
3
0
27 Mar 2025
Fine-Grained Evaluation of Large Vision-Language Models in Autonomous Driving
Fine-Grained Evaluation of Large Vision-Language Models in Autonomous Driving
Yue Li
Meng Tian
Zhenyu Lin
Jiangtong Zhu
Dechang Zhu
Haiqiang Liu
Zining Wang
Yueyi Zhang
Zhiwei Xiong
Xinhai Zhao
CoGeVLM
372
12
0
27 Mar 2025
Mobile-VideoGPT: Fast and Accurate Video Understanding Language Model
Mobile-VideoGPT: Fast and Accurate Video Understanding Language Model
Abdelrahman M. Shaker
Muhammad Maaz
Chenhui Gou
Hamid Rezatofighi
Salman Khan
Fahad Shahbaz Khan
930
3
0
27 Mar 2025
From Trial to Triumph: Advancing Long Video Understanding via Visual Context Sample Scaling and Self-reward Alignment
From Trial to Triumph: Advancing Long Video Understanding via Visual Context Sample Scaling and Self-reward Alignment
Yucheng Suo
Fan Ma
Linchao Zhu
T. Wang
Fengyun Rao
Yi Yang
LRM
317
5
0
26 Mar 2025
Rethinking Vision-Language Model in Face Forensics: Multi-Modal Interpretable Forged Face Detector
Rethinking Vision-Language Model in Face Forensics: Multi-Modal Interpretable Forged Face DetectorComputer Vision and Pattern Recognition (CVPR), 2025
Xiao Guo
Xiufeng Song
Yue Zhang
Xiaohong Liu
Xuyang Liu
405
24
0
26 Mar 2025
Feature4X: Bridging Any Monocular Video to 4D Agentic AI with Versatile Gaussian Feature Fields
Feature4X: Bridging Any Monocular Video to 4D Agentic AI with Versatile Gaussian Feature FieldsComputer Vision and Pattern Recognition (CVPR), 2025
Shijie Zhou
Hui Ren
Yijia Weng
Shuwang Zhang
Zhen Wang
...
Zhiwen Fan
Suya You
Ziyi Wang
Leonidas Guibas
A. Kadambi
VGen3DGS
370
5
0
26 Mar 2025
Exploring Hallucination of Large Multimodal Models in Video Understanding: Benchmark, Analysis and Mitigation
Exploring Hallucination of Large Multimodal Models in Video Understanding: Benchmark, Analysis and Mitigation
Hongcheng Gao
Jiashu Qu
Jingyi Tang
Baolong Bi
Yi Liu
Hongyu Chen
Li Liang
Li Su
Qingming Huang
MLLMVLMLRM
438
13
0
25 Mar 2025
SlowFast-LLaVA-1.5: A Family of Token-Efficient Video Large Language Models for Long-Form Video Understanding
SlowFast-LLaVA-1.5: A Family of Token-Efficient Video Large Language Models for Long-Form Video Understanding
Mingze Xu
Mingfei Gao
Shiyu Li
Jiasen Lu
Zhe Gan
Zhengfeng Lai
Meng Cao
Kai Kang
Yue Yang
Afshin Dehghan
429
15
0
24 Mar 2025
Unbiasing through Textual Descriptions: Mitigating Representation Bias in Video Benchmarks
Unbiasing through Textual Descriptions: Mitigating Representation Bias in Video BenchmarksComputer Vision and Pattern Recognition (CVPR), 2025
Nina Shvetsova
Arsha Nagrani
Bernt Schiele
Hilde Kuehne
Christian Rupprecht
268
1
0
24 Mar 2025
Breaking the Encoder Barrier for Seamless Video-Language Understanding
Breaking the Encoder Barrier for Seamless Video-Language Understanding
Handong Li
Yiyuan Zhang
Longteng Guo
Xiangyu Yue
Jing Liu
VLM
332
3
0
24 Mar 2025
Video-XL-Pro: Reconstructive Token Compression for Extremely Long Video Understanding
Video-XL-Pro: Reconstructive Token Compression for Extremely Long Video Understanding
Xiangrui Liu
Yan Shu
Zhengyang Liang
Ao Li
Yang Tian
Bo Zhao
VGenVLM
561
30
0
24 Mar 2025
4D-Bench: Benchmarking Multi-modal Large Language Models for 4D Object Understanding
4D-Bench: Benchmarking Multi-modal Large Language Models for 4D Object Understanding
Wenxuan Zhu
Bing Li
Cheng Zheng
Jinjie Mai
Jun-Cheng Chen
...
Abdullah Hamdi
Sara Rojas Martinez
Chia-Wen Lin
Mohamed Elhoseiny
Bernard Ghanem
VLM
277
1
0
22 Mar 2025
Plug-and-Play 1.x-Bit KV Cache Quantization for Video Large Language Models
Plug-and-Play 1.x-Bit KV Cache Quantization for Video Large Language Models
Keda Tao
Haoxuan You
Yang Sui
Can Qin
Haoyu Wang
VLMMQ
390
8
0
20 Mar 2025
MASH-VLM: Mitigating Action-Scene Hallucination in Video-LLMs through Disentangled Spatial-Temporal Representations
MASH-VLM: Mitigating Action-Scene Hallucination in Video-LLMs through Disentangled Spatial-Temporal RepresentationsComputer Vision and Pattern Recognition (CVPR), 2025
Kyungho Bae
Jinhyung Kim
Sihaeng Lee
Soonyoung Lee
G. Lee
Jinwoo Choi
301
12
0
20 Mar 2025
Hybrid-Level Instruction Injection for Video Token Compression in Multi-modal Large Language Models
Hybrid-Level Instruction Injection for Video Token Compression in Multi-modal Large Language ModelsComputer Vision and Pattern Recognition (CVPR), 2025
Zhihang Liu
Chen-Wei Xie
Nianzu Yang
Liming Zhao
Longxiang Tang
Yun Zheng
Chuanbin Liu
Hongtao Xie
VLM
255
14
0
20 Mar 2025
DocVideoQA: Towards Comprehensive Understanding of Document-Centric Videos through Question Answering
DocVideoQA: Towards Comprehensive Understanding of Document-Centric Videos through Question AnsweringIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2025
Han Wang
Kai Hu
Liangcai Gao
642
2
0
20 Mar 2025
Neuro Symbolic Knowledge Reasoning for Procedural Video Question Answering
Neuro Symbolic Knowledge Reasoning for Procedural Video Question Answering
Thanh-Son Nguyen
Hong Yang
Tzeh Yuan Neoh
Hao Zhang
Ee Yeo Keat
Basura Fernando
NAI
436
3
0
19 Mar 2025
MusicInfuser: Making Video Diffusion Listen and Dance
MusicInfuser: Making Video Diffusion Listen and Dance
Susung Hong
Ira Kemelmacher-Shlizerman
Brian L. Curless
Steven M. Seitz
VGen
354
1
0
18 Mar 2025
Quantum EigenGame for excited state calculation
Quantum EigenGame for excited state calculation
David Quiroga
Jason Han
Anastasios Kyrillidis
283
5
0
17 Mar 2025
VideoMind: A Chain-of-LoRA Agent for Long Video Reasoning
VideoMind: A Chain-of-LoRA Agent for Long Video Reasoning
Wenshu Fan
Kevin Qinghong Lin
C. Chen
Mike Zheng Shou
LM&RoLRM
941
37
0
17 Mar 2025
Efficient Motion-Aware Video MLLM
Efficient Motion-Aware Video MLLMComputer Vision and Pattern Recognition (CVPR), 2025
Zijia Zhao
Yuqi Huo
Tongtian Yue
Longteng Guo
Haoyu Lu
Binghai Wang
Xin Wu
Qingbin Liu
263
4
0
17 Mar 2025
Multimodal Chain-of-Thought Reasoning: A Comprehensive Survey
Multimodal Chain-of-Thought Reasoning: A Comprehensive Survey
Longji Xu
Shengqiong Wu
Yujiao Shi
William Yang Wang
Ziwei Liu
Jiebo Luo
Hao Fei
LRM
618
114
0
16 Mar 2025
V-STaR: Benchmarking Video-LLMs on Video Spatio-Temporal Reasoning
Zixu Cheng
Jian Hu
Ziquan Liu
Chenyang Si
Wei Li
Shaogang Gong
LRM
336
26
0
14 Mar 2025
Vamba: Understanding Hour-Long Videos with Hybrid Mamba-Transformers
Vamba: Understanding Hour-Long Videos with Hybrid Mamba-Transformers
Weiming Ren
Wentao Ma
Huan Yang
Cong Wei
Ge Zhang
Lei Ma
Mamba
315
20
0
14 Mar 2025
Keyframe-oriented Vision Token Pruning: Enhancing Efficiency of Large Vision Language Models on Long-Form Video Processing
Keyframe-oriented Vision Token Pruning: Enhancing Efficiency of Large Vision Language Models on Long-Form Video Processing
Yudong Liu
Jingwei Sun
Yueqian Lin
Jingyang Zhang
Ming Yin
Qinsi Wang
Jing Zhang
Haoyang Li
Yiran Chen
VLM
516
6
0
13 Mar 2025
LVAgent: Long Video Understanding by Multi-Round Dynamical Collaboration of MLLM Agents
LVAgent: Long Video Understanding by Multi-Round Dynamical Collaboration of MLLM Agents
Boyu Chen
Zhengrong Yue
Siran Chen
Xiping Hu
Yang Liu
Ziwei Sun
Longji Xu
VLM
1.3K
21
0
13 Mar 2025
Exo2Ego: Exocentric Knowledge Guided MLLM for Egocentric Video Understanding
Exo2Ego: Exocentric Knowledge Guided MLLM for Egocentric Video Understanding
Haoyu Zhang
Qiaohui Chu
Meng Liu
Yunxiao Wang
Bin Wen
Fan Yang
EgoV
525
12
0
12 Mar 2025
BIMBA: Selective-Scan Compression for Long-Range Video Question AnsweringComputer Vision and Pattern Recognition (CVPR), 2025
Md. Mohaiminul Islam
Tushar Nagarajan
Huiyu Wang
Gedas Bertasius
Lorenzo Torresani
1.0K
11
0
12 Mar 2025
VideoScan: Enabling Efficient Streaming Video Understanding via Frame-level Semantic Carriers
VideoScan: Enabling Efficient Streaming Video Understanding via Frame-level Semantic Carriers
Ruanjun Li
Yuedong Tan
Yuanming Shi
Jiawei Shao
VLM
729
4
0
12 Mar 2025
VLog: Video-Language Models by Generative Retrieval of Narration Vocabulary
VLog: Video-Language Models by Generative Retrieval of Narration VocabularyComputer Vision and Pattern Recognition (CVPR), 2025
Kevin Qinghong Lin
Mike Zheng Shou
VGen
1.0K
3
0
12 Mar 2025
Memory-enhanced Retrieval Augmentation for Long Video Understanding
Memory-enhanced Retrieval Augmentation for Long Video Understanding
Huaying Yuan
Zhengyang Liang
Minhao Qin
Hongjin Qian
Yan Shu
Zhicheng Dou
Ji-Rong Wen
Andrii Zadaianchuk
VOSRALMVLM
363
9
0
12 Mar 2025
HierarQ: Task-Aware Hierarchical Q-Former for Enhanced Video Understanding
HierarQ: Task-Aware Hierarchical Q-Former for Enhanced Video UnderstandingComputer Vision and Pattern Recognition (CVPR), 2025
Shehreen Azad
Vibhav Vineet
Yogesh S Rawat
VLM
1.1K
12
0
11 Mar 2025
Towards Large-scale Chemical Reaction Image Parsing via a Multimodal Large Language ModelChemical Science (Chem. Sci.), 2025
Yufan Chen
Ching Ting Leung
Jianwei Sun
Yong Huang
Linyan Li
Hao Chen
Hanyu Gao
242
4
0
11 Mar 2025
RAG-Adapter: A Plug-and-Play RAG-enhanced Framework for Long Video Understanding
Xichen Tan
Yunfan Ye
Yuanjing Luo
Qian Wan
Fang Liu
Zhiping Cai
VLM
248
3
0
11 Mar 2025
Previous
12345...101112
Next
Page 4 of 12
Pageof 12