ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2305.06355
  4. Cited By
VideoChat: Chat-Centric Video Understanding
v1v2 (latest)

VideoChat: Chat-Centric Video Understanding

10 May 2023
Kunchang Li
Yinan He
Yi Wang
Yizhuo Li
Wen Wang
Ping Luo
Yali Wang
Limin Wang
Yu Qiao
    MLLM
ArXiv (abs)PDFHTMLHuggingFace (3 upvotes)Github (3246★)

Papers citing "VideoChat: Chat-Centric Video Understanding"

50 / 558 papers shown
Title
StreamForest: Efficient Online Video Understanding with Persistent Event Memory
StreamForest: Efficient Online Video Understanding with Persistent Event Memory
Xiangyu Zeng
Kefan Qiu
Qingyu Zhang
Xinhao Li
Jing Wang
...
Kun Tian
Meng Tian
Xinhai Zhao
Yi Wang
Limin Wang
167
2
0
29 Sep 2025
UniVid: The Open-Source Unified Video Model
UniVid: The Open-Source Unified Video Model
Jiabin Luo
Junhui Lin
Zeyu Zhang
Biao Wu
Meng Fang
Ling-Hao Chen
Hao Tang
VGen
234
6
0
29 Sep 2025
Beyond Isolated Facts: Synthesizing Narrative and Grounded Supervision for VideoQA
Beyond Isolated Facts: Synthesizing Narrative and Grounded Supervision for VideoQA
Jianxin Liang
Tan Yue
Yuxuan Wang
Yueqian Wang
Zhihan Yin
Huishuai Zhang
Dongyan Zhao
80
0
0
29 Sep 2025
Perceive, Reflect and Understand Long Video: Progressive Multi-Granular Clue Exploration with Interactive Agents
Perceive, Reflect and Understand Long Video: Progressive Multi-Granular Clue Exploration with Interactive Agents
J. Li
Kun-Juan Wei
Zhe Xu
Zibo Su
Xu Yang
Cheng Deng
102
0
0
29 Sep 2025
NeMo: Needle in a Montage for Video-Language Understanding
NeMo: Needle in a Montage for Video-Language Understanding
Zi-Yuan Hu
Shuo Liang
Duo Zheng
Yanyang Li
Yeyao Tao
...
Jianguang Yu
Jing-ling Huang
Meng Fang
Yin Li
Liwei Wang
133
1
0
29 Sep 2025
Resolving Ambiguity in Gaze-Facilitated Visual Assistant Interaction Paradigm
Resolving Ambiguity in Gaze-Facilitated Visual Assistant Interaction Paradigm
Zeyu Wang
Baiyu Chen
Kun Yan
Hongjing Piao
Hao Xue
Flora D. Salim
Yuanchun Shi
Yuntao Wang
84
0
0
26 Sep 2025
VideoJudge: Bootstrapping Enables Scalable Supervision of MLLM-as-a-Judge for Video Understanding
VideoJudge: Bootstrapping Enables Scalable Supervision of MLLM-as-a-Judge for Video Understanding
Abdul Waheed
Zhen Wu
Dareen Alharthi
Seungone Kim
Bhiksha Raj
ELM
116
0
0
25 Sep 2025
Poisoning Prompt-Guided Sampling in Video Large Language Models
Poisoning Prompt-Guided Sampling in Video Large Language Models
Yuxin Cao
Wei Song
Jingling Xue
Jin Song Dong
AAML
89
1
0
25 Sep 2025
VideoChat-R1.5: Visual Test-Time Scaling to Reinforce Multimodal Reasoning by Iterative Perception
VideoChat-R1.5: Visual Test-Time Scaling to Reinforce Multimodal Reasoning by Iterative Perception
Ziang Yan
Xinhao Li
Yinan He
Zhengrong Yue
Xiangyu Zeng
Yali Wang
Yu Qiao
Limin Wang
Yi Wang
MLLMVLMLRM
185
10
0
25 Sep 2025
iFinder: Structured Zero-Shot Vision-Based LLM Grounding for Dash-Cam Video Reasoning
iFinder: Structured Zero-Shot Vision-Based LLM Grounding for Dash-Cam Video Reasoning
Manyi Yao
Bingbing Zhuang
Sparsh Garg
Amit Roy-Chowdhury
Christian Shelton
Manmohan Chandraker
Abhishek Aich
LRM
162
0
0
23 Sep 2025
Steering Multimodal Large Language Models Decoding for Context-Aware Safety
Steering Multimodal Large Language Models Decoding for Context-Aware Safety
Zheyuan Liu
Zhangchen Xu
Guangyao Dou
Xiangchi Yuan
Zhaoxuan Tan
Radha Poovendran
Meng Jiang
116
0
0
23 Sep 2025
UniPixel: Unified Object Referring and Segmentation for Pixel-Level Visual Reasoning
UniPixel: Unified Object Referring and Segmentation for Pixel-Level Visual Reasoning
Ye Liu
Zongyang Ma
Junfu Pu
Zhongang Qi
Yang Wu
Mingyu Ding
Chang Wen Chen
MLLMObjDLRM
303
2
0
22 Sep 2025
TempSamp-R1: Effective Temporal Sampling with Reinforcement Fine-Tuning for Video LLMs
TempSamp-R1: Effective Temporal Sampling with Reinforcement Fine-Tuning for Video LLMs
Yunheng Li
Jing Cheng
Shaoyong Jia
Hangyi Kuang
Shaohui Jiao
Qibin Hou
Ming-Ming Cheng
AI4TSVLM
184
5
0
22 Sep 2025
Adaptive Fast-and-Slow Visual Program Reasoning for Long-Form VideoQA
Adaptive Fast-and-Slow Visual Program Reasoning for Long-Form VideoQA
Chenglin Li
Feng Han
FengTao
Ruilin Li
Qianglong Chen
Jingqi Tong
Yin Zhang
Jiaqi Wang
LRM
153
0
0
22 Sep 2025
History-Aware Visuomotor Policy Learning via Point Tracking
History-Aware Visuomotor Policy Learning via Point Tracking
Jingjing Chen
Hongjie Fang
Chenxi Wang
Shiquan Wang
Cewu Lu
128
1
0
21 Sep 2025
Cinéaste: A Fine-grained Contextual Movie Question Answering Benchmark
Cinéaste: A Fine-grained Contextual Movie Question Answering Benchmark
Nisarg A. Shah
Amir Ziai
Chaitanya Ekanadham
Vishal M. Patel
VGenCoGeELM
105
0
0
17 Sep 2025
Enhancing Video Large Language Models with Structured Multi-Video Collaborative Reasoning
Enhancing Video Large Language Models with Structured Multi-Video Collaborative Reasoning
Zhihao He
Tianyao He
Yun Xu
Yun Xu
Huabin Liu
Chaofan Gan
Gui Zou
W. Lin
124
1
0
16 Sep 2025
Bridging Vision Language Models and Symbolic Grounding for Video Question Answering
Bridging Vision Language Models and Symbolic Grounding for Video Question Answering
Haodi Ma
Vyom Pathak
Daisy Zhe Wang
85
0
0
15 Sep 2025
Video Understanding by Design: How Datasets Shape Architectures and Insights
Video Understanding by Design: How Datasets Shape Architectures and Insights
Lei Wang
Piotr Koniusz
Yongsheng Gao
3DVVGenAI4TS
209
0
0
11 Sep 2025
Chirality in Action: Time-Aware Video Representation Learning by Latent Straightening
Chirality in Action: Time-Aware Video Representation Learning by Latent Straightening
Piyush Bagad
Andrew Zisserman
AI4TS
200
2
0
10 Sep 2025
Strefer: Empowering Video LLMs with Space-Time Referring and Reasoning via Synthetic Instruction Data
Strefer: Empowering Video LLMs with Space-Time Referring and Reasoning via Synthetic Instruction Data
Honglu Zhou
Xiangyu Peng
Shrikant B. Kendre
Michael S Ryoo
Silvio Savarese
Caiming Xiong
Juan Carlos Niebles
92
1
0
03 Sep 2025
SurgLLM: A Versatile Large Multimodal Model with Spatial Focus and Temporal Awareness for Surgical Video Understanding
SurgLLM: A Versatile Large Multimodal Model with Spatial Focus and Temporal Awareness for Surgical Video Understanding
Zhen Chen
Xingjian Luo
Kun Yuan
J. Wu
Danny Tat Ming Chan
Nassir Navab
Hongbin Liu
Zhen Lei
Jiebo Luo
176
2
0
30 Aug 2025
InternVL3.5: Advancing Open-Source Multimodal Models in Versatility, Reasoning, and Efficiency
InternVL3.5: Advancing Open-Source Multimodal Models in Versatility, Reasoning, and Efficiency
Weiyun Wang
Zhangwei Gao
Lixin Gu
Hengjun Pu
Long Cui
...
Bowen Zhou
Kai Chen
Yu Qiao
Wenhai Wang
Gen Luo
MLLMLRM
262
212
0
25 Aug 2025
Directed-Tokens: A Robust Multi-Modality Alignment Approach to Large Language-Vision Models
Directed-Tokens: A Robust Multi-Modality Alignment Approach to Large Language-Vision Models
Thanh-Dat Truong
Huu-Thien Tran
Tran Thai Son
Bhiksha Raj
Khoa Luu
214
1
0
19 Aug 2025
EgoLoc: A Generalizable Solution for Temporal Interaction Localization in Egocentric Videos
EgoLoc: A Generalizable Solution for Temporal Interaction Localization in Egocentric Videos
Junyi Ma
Erhang Zhang
Yin-Dong Zheng
Yuchen Xie
Yixuan Zhou
Hesheng Wang
232
0
0
17 Aug 2025
Seeing, Listening, Remembering, and Reasoning: A Multimodal Agent with Long-Term Memory
Seeing, Listening, Remembering, and Reasoning: A Multimodal Agent with Long-Term Memory
Lin Long
Yexiao He
Wentao Ye
Yiyuan Pan
Yuan Lin
Hang Li
Junbo Zhao
Wei Li
290
7
0
13 Aug 2025
KFFocus: Highlighting Keyframes for Enhanced Video Understanding
KFFocus: Highlighting Keyframes for Enhanced Video Understanding
Ming-Jun Nie
Chunwei Wang
Hang Xu
Li Zhang
VGen
81
0
0
12 Aug 2025
TAG: A Simple Yet Effective Temporal-Aware Approach for Zero-Shot Video Temporal Grounding
TAG: A Simple Yet Effective Temporal-Aware Approach for Zero-Shot Video Temporal Grounding
Jin-Seop Lee
SungJoon Lee
Jaehan Ahn
YunSeok Choi
Jee-Hyong Lee
VLM
106
2
0
11 Aug 2025
A Survey on Video Temporal Grounding with Multimodal Large Language Model
A Survey on Video Temporal Grounding with Multimodal Large Language ModelIEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2025
Yue Yu
Wei Liu
Y. Liu
Meng-yang Liu
Liqiang Nie
Zhouchen Lin
C. Chen
AI4TSVLMLRM
137
6
0
07 Aug 2025
Training-Free Multimodal Large Language Model Orchestration
Training-Free Multimodal Large Language Model Orchestration
Tianyu Xie
Yuhang Wu
Yongdong Luo
Jinfa Huang
Xiawu Zheng
108
0
0
06 Aug 2025
Free-MoRef: Instantly Multiplexing Context Perception Capabilities of Video-MLLMs within Single Inference
Free-MoRef: Instantly Multiplexing Context Perception Capabilities of Video-MLLMs within Single Inference
Kuo Wang
Quanlong Zheng
Junlin Xie
Yanhao Zhang
Jinguo Luo
Haonan Lu
Guanbin Li
Fan Zhou
Guanbin Li
VLM
64
1
0
04 Aug 2025
VLM4D: Towards Spatiotemporal Awareness in Vision Language Models
VLM4D: Towards Spatiotemporal Awareness in Vision Language Models
Shijie Zhou
Alexander Vilesov
Xuehai He
Ziyu Wan
Shuwang Zhang
Aditya Nagachandra
Di Chang
DongDong Chen
Xin Eric Wang
A. Kadambi
VLM
158
0
0
04 Aug 2025
StreamAgent: Towards Anticipatory Agents for Streaming Video Understanding
StreamAgent: Towards Anticipatory Agents for Streaming Video Understanding
Haolin Yang
Feilong Tang
Linxiao Zhao
Xiang An
Ming Hu
...
Yifan Lu
Xiaofeng Zhang
Abdalla Swikir
Junjun He
Zongyuan Ge
271
2
0
03 Aug 2025
Beyond Gloss: A Hand-Centric Framework for Gloss-Free Sign Language Translation
Beyond Gloss: A Hand-Centric Framework for Gloss-Free Sign Language Translation
Sobhan Asasi
Mohamed Ilyas Lakhal
Ozge Mercanoglu Sincan
Richard Bowden
SLR
170
0
0
31 Jul 2025
FMimic: Foundation Models are Fine-grained Action Learners from Human Videos
FMimic: Foundation Models are Fine-grained Action Learners from Human VideosThe international journal of robotics research (IJRR), 2025
Guangyan Chen
Meiling Wang
Te Cui
Yao Mu
Haoyang Lu
...
Mengxiao Hu
Tianxing Zhou
M. Fu
Yi Yang
Yufeng Yue
LM&RoVLM
101
5
0
28 Jul 2025
Towards Universal Modal Tracking with Online Dense Temporal Token Learning
Towards Universal Modal Tracking with Online Dense Temporal Token LearningIEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2025
Yaozong Zheng
Bineng Zhong
Qihua Liang
Shengping Zhang
Guorong Li
Xianxian Li
Rongrong Ji
133
19
0
27 Jul 2025
When Tokens Talk Too Much: A Survey of Multimodal Long-Context Token Compression across Images, Videos, and Audios
When Tokens Talk Too Much: A Survey of Multimodal Long-Context Token Compression across Images, Videos, and Audios
Kele Shao
Keda Tao
Kejia Zhang
Sicheng Feng
Mu Cai
Yuzhang Shang
Haoxuan You
Can Qin
Yang Sui
Huan Wang
453
10
0
27 Jul 2025
Datasets and Recipes for Video Temporal Grounding via Reinforcement Learning
Datasets and Recipes for Video Temporal Grounding via Reinforcement Learning
Ruizhe Chen
Zhiting Fan
Tianze Luo
Heqing Zou
Zhaopeng Feng
Guiyang Xie
Hansheng Zhang
Zhuochen Wang
Zuozhu Liu
Huaijian Zhang
AI4TS
135
7
0
24 Jul 2025
EgoExoBench: A Benchmark for First- and Third-person View Video Understanding in MLLMs
EgoExoBench: A Benchmark for First- and Third-person View Video Understanding in MLLMs
Yuping He
Yifei Huang
Guo Chen
Baoqi Pei
Jilan Xu
Tong Lu
Jiangmiao Pang
EgoV
195
10
0
24 Jul 2025
DynImg: Key Frames with Visual Prompts are Good Representation for Multi-Modal Video Understanding
DynImg: Key Frames with Visual Prompts are Good Representation for Multi-Modal Video Understanding
Xiaoyi Bao
Chenwei Xie
Hao Tang
Tingyu Weng
Xiaofeng Wang
Yun Zheng
Xingang Wang
VGen
123
1
0
21 Jul 2025
Enhancing Visual Planning with Auxiliary Tasks and Multi-token Prediction
Enhancing Visual Planning with Auxiliary Tasks and Multi-token Prediction
Ce Zhang
Yale Song
Ruta Desai
Michael L. Iuzzolino
Joseph Tighe
Gedas Bertasius
Satwik Kottur
155
1
0
20 Jul 2025
Scaling RL to Long Videos
Scaling RL to Long Videos
Yukang Chen
Wei Huang
Baifeng Shi
Qinghao Hu
Hanrong Ye
...
Xiaojuan Qi
Sifei Liu
Hongxu Yin
Yao Lu
Song Han
OffRLAI4TSVLMLRM
314
33
0
10 Jul 2025
HumanVideo-MME: Benchmarking MLLMs for Human-Centric Video Understanding
HumanVideo-MME: Benchmarking MLLMs for Human-Centric Video Understanding
Rui Yu
J. Zhang
Zhenye Gan
Qingdong He
Xiaobin Hu
...
Chengjie Wang
Zhucun Xue
Chaoyou Fu
Xinwei He
Xiang Bai
VLM
97
0
0
07 Jul 2025
CountLLM: Towards Generalizable Repetitive Action Counting via Large Language Model
CountLLM: Towards Generalizable Repetitive Action Counting via Large Language ModelComputer Vision and Pattern Recognition (CVPR), 2025
Ziyu Yao
Xuxin Cheng
Zhiqi Huang
Lei Li
332
5
0
01 Jul 2025
ActAlign: Zero-Shot Fine-Grained Video Classification via Language-Guided Sequence Alignment
ActAlign: Zero-Shot Fine-Grained Video Classification via Language-Guided Sequence Alignment
Amir Aghdam
Vincent Tao Hu
Bjorn Ommer
VLM
231
2
0
28 Jun 2025
ThinkSound: Chain-of-Thought Reasoning in Multimodal Large Language Models for Audio Generation and Editing
ThinkSound: Chain-of-Thought Reasoning in Multimodal Large Language Models for Audio Generation and Editing
Huadai Liu
Kaicheng Luo
Jialei Wang
Wen Wang
Qian Chen
Zhou Zhao
Wei Xue
VGenLRM
337
13
0
26 Jun 2025
SurgVidLM: Towards Multi-grained Surgical Video Understanding with Large Language Model
SurgVidLM: Towards Multi-grained Surgical Video Understanding with Large Language Model
Guankun Wang
Junyi Wang
Wenjin Mo
Long Bai
Kun Yuan
...
N. Padoy
Zhen Lei
Hongbin Liu
Nassir Navab
Hongliang Ren
145
2
0
22 Jun 2025
PR-DETR: Injecting Position and Relation Prior for Dense Video Captioning
PR-DETR: Injecting Position and Relation Prior for Dense Video Captioning
Yizhe Li
Sanping Zhou
Zheng Qin
Le Wang
ViT
160
0
0
19 Jun 2025
SmartHome-Bench: A Comprehensive Benchmark for Video Anomaly Detection in Smart Homes Using Multi-Modal Large Language Models
SmartHome-Bench: A Comprehensive Benchmark for Video Anomaly Detection in Smart Homes Using Multi-Modal Large Language Models
Xinyi Zhao
Congjing Zhang
Pei Guo
Wei Li
Lin Chen
Chaoyue Zhao
Shuai Huang
171
1
0
15 Jun 2025
DaMO: A Data-Efficient Multimodal Orchestrator for Temporal Reasoning with Video LLMs
DaMO: A Data-Efficient Multimodal Orchestrator for Temporal Reasoning with Video LLMs
Bo-Cheng Chiu
Jen-Jee Chen
Yu-Chee Tseng
Feng-Chi Chen
237
0
0
13 Jun 2025
Previous
12345...101112
Next