ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2305.06355
  4. Cited By
VideoChat: Chat-Centric Video Understanding
v1v2 (latest)

VideoChat: Chat-Centric Video Understanding

10 May 2023
Kunchang Li
Yinan He
Yi Wang
Yizhuo Li
Wen Wang
Ping Luo
Yali Wang
Limin Wang
Yu Qiao
    MLLM
ArXiv (abs)PDFHTMLHuggingFace (3 upvotes)Github (3246★)

Papers citing "VideoChat: Chat-Centric Video Understanding"

50 / 558 papers shown
Title
TOGA: Temporally Grounded Open-Ended Video QA with Weak Supervision
TOGA: Temporally Grounded Open-Ended Video QA with Weak Supervision
Ayush Gupta
A. Roy
Rama Chellappa
Nathaniel D. Bastian
Alvaro Velasquez
Susmit Jha
145
0
0
11 Jun 2025
DeepVideo-R1: Video Reinforcement Fine-Tuning via Difficulty-aware Regressive GRPO
DeepVideo-R1: Video Reinforcement Fine-Tuning via Difficulty-aware Regressive GRPO
Jinyoung Park
Jeehye Na
Jinyoung Kim
H. Kim
OffRL
312
18
0
09 Jun 2025
EgoM2P: Egocentric Multimodal Multitask Pretraining
EgoM2P: Egocentric Multimodal Multitask Pretraining
Gen Li
Yutong Chen
Yiqian Wu
Kaifeng Zhao
Marc Pollefeys
Siyu Tang
EgoVVLM
367
4
0
09 Jun 2025
Super Encoding Network: Recursive Association of Multi-Modal Encoders for Video Understanding
Super Encoding Network: Recursive Association of Multi-Modal Encoders for Video Understanding
Boyu Chen
Siran Chen
Kunchang Li
Qinglin Xu
Yu Qiao
Yali Wang
VOS
287
7
0
09 Jun 2025
MAGNET: A Multi-agent Framework for Finding Audio-Visual Needles by Reasoning over Multi-Video Haystacks
MAGNET: A Multi-agent Framework for Finding Audio-Visual Needles by Reasoning over Multi-Video Haystacks
Sanjoy Chowdhury
Mohamed Elmoghany
Yohan Abeysinghe
Mahmoud Ahmed
Sayan Nag
Salman Khan
Mohamed Elhoseiny
Dinesh Manocha
325
4
0
08 Jun 2025
Movie Facts and Fibs (MF$^2$): A Benchmark for Long Movie Understanding
Movie Facts and Fibs (MF2^22): A Benchmark for Long Movie Understanding
Emmanouil Zaranis
António Farinhas
Saul Santos
Beatriz Canaverde
Miguel Moura Ramos
...
Raffaella Bernardi
Raquel Fernández
Sandro Pezzelle
Vlad Niculae
Andre F. T. Martins
219
3
0
06 Jun 2025
Pts3D-LLM: Studying the Impact of Token Structure for 3D Scene Understanding With Large Language Models
Pts3D-LLM: Studying the Impact of Token Structure for 3D Scene Understanding With Large Language Models
Hugues Thomas
Chen Chen
Jian Zhang
185
0
0
06 Jun 2025
Go Beyond Earth: Understanding Human Actions and Scenes in Microgravity Environments
Go Beyond Earth: Understanding Human Actions and Scenes in Microgravity Environments
Di Wen
Lei Qi
Kunyu Peng
Kailun Yang
Fei Teng
...
Yufan Chen
R. Liu
Yitian Shi
M. Sarfraz
Rainer Stiefelhagen
362
0
0
03 Jun 2025
Unraveling Spatio-Temporal Foundation Models via the Pipeline Lens: A Comprehensive Review
Unraveling Spatio-Temporal Foundation Models via the Pipeline Lens: A Comprehensive Review
Yuchen Fang
Hao Miao
Yuxuan Liang
Liwei Deng
Yue Cui
...
Yan Zhao
T. Pedersen
Christian S. Jensen
Xiaofang Zhou
Kai Zheng
AI4TSAI4CE
232
5
0
02 Jun 2025
Align is not Enough: Multimodal Universal Jailbreak Attack against Multimodal Large Language Models
Align is not Enough: Multimodal Universal Jailbreak Attack against Multimodal Large Language Models
Youze Wang
Wenbo Hu
Yinpeng Dong
Jing Liu
Hanwang Zhang
Richang Hong
212
7
0
02 Jun 2025
DisTime: Distribution-based Time Representation for Video Large Language Models
DisTime: Distribution-based Time Representation for Video Large Language Models
Yingsen Zeng
Zepeng Huang
Yujie Zhong
Chengjian Feng
Jie Hu
Lin Ma
Yang Liu
VGen
230
3
0
30 May 2025
Grid-LOGAT: Grid Based Local and Global Area Transcription for Video Question Answering
Grid-LOGAT: Grid Based Local and Global Area Transcription for Video Question AnsweringInternational Conference on Information Photonics (ICIP), 2025
Md Intisar Chowdhury
Kittinun Aukkapinyo
Hiroshi Fujimura
Joo Ann Woo
Wasu Wasusatein
Fadoua Ghourabi
258
0
0
30 May 2025
Period-LLM: Extending the Periodic Capability of Multimodal Large Language Model
Period-LLM: Extending the Periodic Capability of Multimodal Large Language ModelComputer Vision and Pattern Recognition (CVPR), 2025
Yuting Zhang
Hao Lu
Qingyong Hu
Yin Wang
Kaishen Yuan
Xin Liu
Kaishun Wu
MLLMLRM
211
4
0
30 May 2025
Spatial-MLLM: Boosting MLLM Capabilities in Visual-based Spatial Intelligence
Diankun Wu
Fangfu Liu
Yi-Hsin Hung
Yueqi Duan
LRM
256
59
0
29 May 2025
Vid-SME: Membership Inference Attacks against Large Video Understanding Models
Vid-SME: Membership Inference Attacks against Large Video Understanding Models
Qi Li
Runpeng Yu
Xinchao Wang
271
4
0
29 May 2025
VidText: Towards Comprehensive Evaluation for Video Text Understanding
VidText: Towards Comprehensive Evaluation for Video Text Understanding
Zhoufaran Yang
Yan Shu
Zhifei Yang
Zhifei Yang
Yan Zhang
...
Gangyan Zeng
Gangyan Zeng
Yu Zhou
Andrii Zadaianchuk
Nicu Sebe
CoGe
333
4
0
28 May 2025
HoliTom: Holistic Token Merging for Fast Video Large Language Models
HoliTom: Holistic Token Merging for Fast Video Large Language Models
Kele Shao
Keda Tao
Can Qin
Haoxuan You
Yang Sui
Huan Wang
VLM
557
14
0
27 May 2025
Adversarial Attacks against Closed-Source MLLMs via Feature Optimal Alignment
Adversarial Attacks against Closed-Source MLLMs via Feature Optimal Alignment
Xiaojun Jia
Sensen Gao
Simeng Qin
Tianyu Pang
C. Du
Yihao Huang
Xinfeng Li
Yiming Li
Bo Li
Wenshu Fan
AAML
234
10
0
27 May 2025
HuMoCon: Concept Discovery for Human Motion Understanding
HuMoCon: Concept Discovery for Human Motion UnderstandingComputer Vision and Pattern Recognition (CVPR), 2025
Qihang Fang
Chengcheng Tang
Bugra Tekin
Shugao Ma
Yanchao Yang
166
1
0
27 May 2025
Vad-R1: Towards Video Anomaly Reasoning via Perception-to-Cognition Chain-of-Thought
Vad-R1: Towards Video Anomaly Reasoning via Perception-to-Cognition Chain-of-Thought
Chao Huang
Benfeng Wang
Jie Wen
Chengliang Liu
Wei Wang
Li Shen
Xiaochun Cao
LRM
267
5
0
26 May 2025
Instructify: Demystifying Metadata to Visual Instruction Tuning Data Conversion
Jacob A. Hansen
Wei Lin
Junmo Kang
M. Jehanzeb Mirza
Hongyin Luo
Rogerio Feris
Alan Ritter
James R. Glass
Leonid Karlinsky
VLM
406
1
0
23 May 2025
Panoptic Captioning: An Equivalence Bridge for Image and Text
Panoptic Captioning: An Equivalence Bridge for Image and Text
Kun-Yu Lin
Hongjun Wang
Weining Ren
Kai Han
607
0
0
22 May 2025
From Evaluation to Defense: Advancing Safety in Video Large Language Models
From Evaluation to Defense: Advancing Safety in Video Large Language Models
Yiwei Sun
Peiqi Jiang
Chuanbin Liu
Luohao Lin
Zhiying Lu
Hongtao Xie
185
1
0
22 May 2025
VideoEval-Pro: Robust and Realistic Long Video Understanding Evaluation
VideoEval-Pro: Robust and Realistic Long Video Understanding Evaluation
Wentao Ma
Weiming Ren
Yiming Jia
Zhuofeng Li
Ping Nie
Ge Zhang
Wenhu Chen
253
5
0
20 May 2025
Hearing from Silence: Reasoning Audio Descriptions from Silent Videos via Vision-Language Model
Hearing from Silence: Reasoning Audio Descriptions from Silent Videos via Vision-Language Model
Yong Ren
Chenxing Li
Le Xu
Hao Gu
Duzhen Zhang
Yujie Chen
Manjie Xu
Ruibo Fu
Shan Yang
Dong Yu
LRM
421
1
0
19 May 2025
Temporal-Oriented Recipe for Transferring Large Vision-Language Model to Video Understanding
Temporal-Oriented Recipe for Transferring Large Vision-Language Model to Video Understanding
Thong Nguyen
Zhiyuan Hu
Xu Lin
Cong-Duy Nguyen
See-Kiong Ng
Luu Anh Tuan
VLM
344
1
0
19 May 2025
Sage Deer: A Super-Aligned Driving Generalist Is Your Copilot
Sage Deer: A Super-Aligned Driving Generalist Is Your Copilot
Hao Lu
Jiaqi Tang
Jiyao Wang
Yaojie Lu
Xu Cao
...
Bin Huang
Dengbo He
Shuiguang Deng
Hao Chen
Ying-Cong Chen
266
1
0
15 May 2025
EchoInk-R1: Exploring Audio-Visual Reasoning in Multimodal LLMs via Reinforcement Learning
EchoInk-R1: Exploring Audio-Visual Reasoning in Multimodal LLMs via Reinforcement Learning
Zhenghao Xing
Xiaowei Hu
Chi-Wing Fu
Wei Wang
Jifeng Dai
Pheng-Ann Heng
MLLMOffRLVLMLRM
306
12
0
07 May 2025
"I Can See Forever!": Evaluating Real-time VideoLLMs for Assisting Individuals with Visual Impairments
"I Can See Forever!": Evaluating Real-time VideoLLMs for Assisting Individuals with Visual Impairments
Zheng Zhang
Zhen Sun
Zhenru Zhang
Zifan Peng
Yuemeng Zhao
Liang Luo
Zeren Luo
Ruiting Zuo
Xinlei He
231
2
0
07 May 2025
RTV-Bench: Benchmarking MLLM Continuous Perception, Understanding and Reasoning through Real-Time Video
RTV-Bench: Benchmarking MLLM Continuous Perception, Understanding and Reasoning through Real-Time Video
Shuhang Xun
Sicheng Tao
Jiajun Li
Yibo Shi
Zhixin Lin
...
Shikang Wang
Wenshu Fan
Hao Zhang
Ying Ma
Xuming Hu
VLMLRM
363
4
0
04 May 2025
Enhancing the Learning Experience: Using Vision-Language Models to Generate Questions for Educational Videos
Enhancing the Learning Experience: Using Vision-Language Models to Generate Questions for Educational VideosInternational Conference on Artificial Intelligence in Education (AIED), 2025
Markos Stamatakis
Joshua Berger
Christian Wartena
Ralph Ewerth
Anett Hoppe
AI4Ed
305
1
0
03 May 2025
VideoHallu: Evaluating and Mitigating Multi-modal Hallucinations on Synthetic Video Understanding
VideoHallu: Evaluating and Mitigating Multi-modal Hallucinations on Synthetic Video Understanding
Zongxia Li
Xiyang Wu
Guangyao Shi
Yubin Qin
Hongyang Du
Tianyi Zhou
Wanrong Zhu
Dinesh Manocha
Jordan Lee Boyd-Graber
MLLM
547
0
0
02 May 2025
AdCare-VLM: Towards a Unified and Pre-aligned Latent Representation for Healthcare Video Understanding
AdCare-VLM: Towards a Unified and Pre-aligned Latent Representation for Healthcare Video Understanding
Md Asaduzzaman Jabin
Hanqi Jiang
Yuchen Ren
Patrick Kaggwa
Eugene Douglass
Juliet N. Sekandi
Tianming Liu
LM&MA
420
0
0
01 May 2025
A Survey of Foundation Model-Powered Recommender Systems: From Feature-Based, Generative to Agentic Paradigms
A Survey of Foundation Model-Powered Recommender Systems: From Feature-Based, Generative to Agentic Paradigms
Chengkai Huang
Hongtao Huang
Tong Yu
Kaige Xie
Junda Wu
Shuai Zhang
Julian McAuley
Dietmar Jannach
Lina Yao
LRMAI4CE
274
7
0
23 Apr 2025
MR. Video: "MapReduce" is the Principle for Long Video Understanding
MR. Video: "MapReduce" is the Principle for Long Video Understanding
Ziqi Pang
Yu-Xiong Wang
VLM
229
5
0
22 Apr 2025
IV-Bench: A Benchmark for Image-Grounded Video Perception and Reasoning in Multimodal LLMs
IV-Bench: A Benchmark for Image-Grounded Video Perception and Reasoning in Multimodal LLMs
David Ma
Yanzhe Zhang
J. Ren
Jarvis Guo
Yifan Yao
...
Shiwen Ni
Jing Liu
Wenhao Huang
Ge Zhang
Xiaojie Jin
VLM
298
3
0
21 Apr 2025
ResNetVLLM -- Multi-modal Vision LLM for the Video Understanding Task
ResNetVLLM -- Multi-modal Vision LLM for the Video Understanding Task
Ahmad Khalil
Mahmoud Khalil
A. Ngom
VLM
252
1
0
20 Apr 2025
ResNetVLLM-2: Addressing ResNetVLLM's Multi-Modal Hallucinations
ResNetVLLM-2: Addressing ResNetVLLM's Multi-Modal Hallucinations
Ahmad Khalil
Mahmoud Khalil
A. Ngom
MLLMVLM
263
1
0
20 Apr 2025
Grounding-MD: Grounded Video-language Pre-training for Open-World Moment Detection
Grounding-MD: Grounded Video-language Pre-training for Open-World Moment Detection
Weijun Zhuang
Qizhang Li
Xin Li
Ming-Yu Liu
Xiaopeng Hong
Feng Gao
Fan Yang
W. Zuo
242
1
0
20 Apr 2025
VistaDPO: Video Hierarchical Spatial-Temporal Direct Preference Optimization for Large Video Models
VistaDPO: Video Hierarchical Spatial-Temporal Direct Preference Optimization for Large Video Models
Haojian Huang
Haodong Chen
Shengqiong Wu
Meng Luo
Jinlan Fu
Xinya Du
Hao Zhang
Hao Fei
AI4TS
922
8
0
17 Apr 2025
Self-alignment of Large Video Language Models with Refined Regularized Preference Optimization
Self-alignment of Large Video Language Models with Refined Regularized Preference Optimization
Pritam Sarkar
Ali Etemad
302
2
0
16 Apr 2025
Mavors: Multi-granularity Video Representation for Multimodal Large Language Model
Mavors: Multi-granularity Video Representation for Multimodal Large Language Model
Yang Shi
Jiaheng Liu
Yushuo Guan
Zhikai Wu
Yujiao Shi
...
Bohan Zeng
Wei Zhang
Fuzheng Zhang
Wenjing Yang
Di Zhang
VGenVLM
347
11
0
14 Apr 2025
InternVL3: Exploring Advanced Training and Test-Time Recipes for Open-Source Multimodal Models
InternVL3: Exploring Advanced Training and Test-Time Recipes for Open-Source Multimodal Models
Jinguo Zhu
Weiyun Wang
Zhe Chen
Ziwei Liu
Shenglong Ye
...
Dahua Lin
Yu Qiao
Jifeng Dai
Wenhai Wang
Wei Wang
MLLMVLM
529
739
1
14 Apr 2025
Multimodal Long Video Modeling Based on Temporal Dynamic Context
Multimodal Long Video Modeling Based on Temporal Dynamic Context
Haoran Hao
Jiaming Han
Yiyuan Zhang
Xiangyu Yue
416
0
0
14 Apr 2025
SF2T: Self-supervised Fragment Finetuning of Video-LLMs for Fine-Grained Understanding
SF2T: Self-supervised Fragment Finetuning of Video-LLMs for Fine-Grained UnderstandingComputer Vision and Pattern Recognition (CVPR), 2025
Yangliu Hu
Zikai Song
Na Feng
Yawei Luo
Junqing Yu
Yi-Ping Phoebe Chen
Wei Yang
157
10
0
10 Apr 2025
Memory-efficient Streaming VideoLLMs for Real-time Procedural Video Understanding
Memory-efficient Streaming VideoLLMs for Real-time Procedural Video Understanding
Dibyadip Chatterjee
Edoardo Remelli
Yale Song
Bugra Tekin
Abhay Mittal
...
Shreyas Hampali
Eric Sauser
Shugao Ma
Angela Yao
Fadime Sener
VLM
238
3
0
10 Apr 2025
How Can Objects Help Video-Language Understanding?
How Can Objects Help Video-Language Understanding?
Zitian Tang
Shijie Wang
Junho Cho
Jaewook Yoo
Chen Sun
306
1
0
10 Apr 2025
VideoChat-R1: Enhancing Spatio-Temporal Perception via Reinforcement Fine-Tuning
VideoChat-R1: Enhancing Spatio-Temporal Perception via Reinforcement Fine-Tuning
Xinhao Li
Ziang Yan
Desen Meng
Yi Liu
Xiangyu Zeng
Yinan He
Yun Wang
Yu Qiao
Yi Wang
Limin Wang
VLMAI4TSLRM
738
114
0
09 Apr 2025
PaMi-VDPO: Mitigating Video Hallucinations by Prompt-Aware Multi-Instance Video Preference Learning
PaMi-VDPO: Mitigating Video Hallucinations by Prompt-Aware Multi-Instance Video Preference Learning
Xinpeng Ding
Jianchao Tan
Jinahua Han
Lanqing Hong
Hang Xu
Xuelong Li
MLLMVLM
1.1K
3
0
08 Apr 2025
REVEAL: Relation-based Video Representation Learning for Video-Question-Answering
REVEAL: Relation-based Video Representation Learning for Video-Question-Answering
Sofian Chaybouti
Walid Bousselham
Moritz Wolter
Hilde Kuehne
836
0
0
07 Apr 2025
Previous
123456...101112
Next