ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2305.06355
  4. Cited By
VideoChat: Chat-Centric Video Understanding
v1v2 (latest)

VideoChat: Chat-Centric Video Understanding

10 May 2023
Kunchang Li
Yinan He
Yi Wang
Yizhuo Li
Wen Wang
Ping Luo
Yali Wang
Limin Wang
Yu Qiao
    MLLM
ArXiv (abs)PDFHTMLHuggingFace (3 upvotes)Github (3246★)

Papers citing "VideoChat: Chat-Centric Video Understanding"

50 / 563 papers shown
User Intent Recognition and Satisfaction with Large Language Models: A
  User Study with ChatGPT
User Intent Recognition and Satisfaction with Large Language Models: A User Study with ChatGPT
Anna Bodonhelyi
Efe Bozkir
Shuo Yang
Enkelejda Kasneci
Gjergji Kasneci
ELMAI4MH
192
28
0
03 Feb 2024
A Survey on Generative AI and LLM for Video Generation, Understanding,
  and Streaming
A Survey on Generative AI and LLM for Video Generation, Understanding, and Streaming
Pengyuan Zhou
Lin Wang
Zhi Liu
Yanbin Hao
Pan Hui
Sasu Tarkoma
J. Kangasharju
VGen
255
50
0
30 Jan 2024
GPT4Ego: Unleashing the Potential of Pre-trained Models for Zero-Shot
  Egocentric Action Recognition
GPT4Ego: Unleashing the Potential of Pre-trained Models for Zero-Shot Egocentric Action Recognition
Guangzhao Dai
Xiangbo Shu
Wenhao Wu
Rui Yan
Jiachao Zhang
VLM
432
9
0
18 Jan 2024
On the Audio Hallucinations in Large Audio-Video Language Models
On the Audio Hallucinations in Large Audio-Video Language Models
Taichi Nishimura
Shota Nakada
Masayoshi Kondo
VLM
233
12
0
18 Jan 2024
Vlogger: Make Your Dream A Vlog
Vlogger: Make Your Dream A VlogComputer Vision and Pattern Recognition (CVPR), 2024
Shaobin Zhuang
Kunchang Li
Xinyuan Chen
Yaohui Wang
Ziwei Liu
Yu Qiao
Yali Wang
VGenDiffM
156
64
0
17 Jan 2024
DoraemonGPT: Toward Understanding Dynamic Scenes with Large Language Models (Exemplified as A Video Agent)
DoraemonGPT: Toward Understanding Dynamic Scenes with Large Language Models (Exemplified as A Video Agent)International Conference on Machine Learning (ICML), 2024
Zongxin Yang
Guikun Chen
Xiaodi Li
Wenguan Wang
Yi Yang
LM&RoLLMAG
515
64
0
16 Jan 2024
Towards A Better Metric for Text-to-Video Generation
Towards A Better Metric for Text-to-Video Generation
Jay Zhangjie Wu
Guian Fang
Haoning Wu
Xintao Wang
Yixiao Ge
...
Rui Zhao
Weisi Lin
Wynne Hsu
Ying Shan
Mike Zheng Shou
VGen
256
44
0
15 Jan 2024
ModaVerse: Efficiently Transforming Modalities with LLMs
ModaVerse: Efficiently Transforming Modalities with LLMsComputer Vision and Pattern Recognition (CVPR), 2024
Xinyu Wang
Bohan Zhuang
Qi Wu
204
23
0
12 Jan 2024
Distilling Vision-Language Models on Millions of Videos
Distilling Vision-Language Models on Millions of VideosComputer Vision and Pattern Recognition (CVPR), 2024
Yue Zhao
Long Zhao
Xingyi Zhou
Jialin Wu
Chun-Te Chu
...
Hartwig Adam
Ting Liu
Boqing Gong
Philipp Krahenbuhl
Liangzhe Yuan
VLM
283
20
0
11 Jan 2024
Video Anomaly Detection and Explanation via Large Language Models
Video Anomaly Detection and Explanation via Large Language Models
Hui Lv
Qianru Sun
253
53
0
11 Jan 2024
SonicVisionLM: Playing Sound with Vision Language Models
SonicVisionLM: Playing Sound with Vision Language ModelsComputer Vision and Pattern Recognition (CVPR), 2024
Zhifeng Xie
Shengye Yu
Qile He
Mengtian Li
VLMVGen
189
3
0
09 Jan 2024
Towards Truly Zero-shot Compositional Visual Reasoning with LLMs as
  Programmers
Towards Truly Zero-shot Compositional Visual Reasoning with LLMs as Programmers
Aleksandar Stanić
Sergi Caelles
Michael Tschannen
LRMVLM
330
13
0
03 Jan 2024
Holistic Autonomous Driving Understanding by Bird's-Eye-View Injected
  Multi-Modal Large Models
Holistic Autonomous Driving Understanding by Bird's-Eye-View Injected Multi-Modal Large ModelsComputer Vision and Pattern Recognition (CVPR), 2024
Xinpeng Ding
Jinahua Han
Hang Xu
Xiaodan Liang
Wei Zhang
Xiaomeng Li
314
84
0
02 Jan 2024
Taking the Next Step with Generative Artificial Intelligence: The
  Transformative Role of Multimodal Large Language Models in Science Education
Taking the Next Step with Generative Artificial Intelligence: The Transformative Role of Multimodal Large Language Models in Science EducationLearning and Individual Differences (LID), 2024
Arne Bewersdorff
Christian Hartmann
Marie Hornberger
Kathrin Seßler
Maria Bannert
Enkelejda Kasneci
Gjergji Kasneci
Xiaoming Zhai
Claudia Nerdel
302
87
0
01 Jan 2024
Video Understanding with Large Language Models: A Survey
Video Understanding with Large Language Models: A Survey
Yunlong Tang
Jing Bi
Siting Xu
Luchuan Song
Susan Liang
...
Feng Zheng
Jianguo Zhang
Chenliang Xu
Jiebo Luo
Chenliang Xu
VLM
760
174
0
29 Dec 2023
Unified-IO 2: Scaling Autoregressive Multimodal Models with Vision,
  Language, Audio, and Action
Unified-IO 2: Scaling Autoregressive Multimodal Models with Vision, Language, Audio, and Action
Jiasen Lu
Christopher Clark
Sangho Lee
Zichen Zhang
Savya Khosla
Ryan Marten
Derek Hoiem
Aniruddha Kembhavi
VLMMLLM
283
274
0
28 Dec 2023
Grounding-Prompter: Prompting LLM with Multimodal Information for
  Temporal Sentence Grounding in Long Videos
Grounding-Prompter: Prompting LLM with Multimodal Information for Temporal Sentence Grounding in Long Videos
Houlun Chen
Xin Wang
Hong Chen
Zihan Song
Jia Jia
Wenwu Zhu
LRM
247
18
0
28 Dec 2023
Visual Instruction Tuning towards General-Purpose Multimodal Model: A
  Survey
Visual Instruction Tuning towards General-Purpose Multimodal Model: A Survey
Jiaxing Huang
Jingyi Zhang
Kai Jiang
Han Qiu
Shijian Lu
198
30
0
27 Dec 2023
Plan, Posture and Go: Towards Open-World Text-to-Motion Generation
Plan, Posture and Go: Towards Open-World Text-to-Motion Generation
Jinpeng Liu
Wen-Dao Dai
Chunyu Wang
Yiji Cheng
Yansong Tang
Xin Tong
VGenDiffM
284
24
0
22 Dec 2023
InternVL: Scaling up Vision Foundation Models and Aligning for Generic
  Visual-Linguistic Tasks
InternVL: Scaling up Vision Foundation Models and Aligning for Generic Visual-Linguistic Tasks
Zhe Chen
Jiannan Wu
Wenhai Wang
Weijie Su
Guo Chen
...
Bin Li
Ping Luo
Tong Lu
Yu Qiao
Jifeng Dai
VLMMLLM
649
2,210
0
21 Dec 2023
LLM4VG: Large Language Models Evaluation for Video Grounding
LLM4VG: Large Language Models Evaluation for Video Grounding
Wei Feng
Xin Wang
Hong Chen
Zeyang Zhang
Zihan Song
Yuwei Zhou
Wenwu Zhu
408
10
0
21 Dec 2023
Generative Multimodal Models are In-Context Learners
Generative Multimodal Models are In-Context Learners
Quan-Sen Sun
Yufeng Cui
Xiaosong Zhang
Fan Zhang
Qiying Yu
...
Yueze Wang
Yongming Rao
Jingjing Liu
Tiejun Huang
Xinlong Wang
MLLMLRM
398
422
0
20 Dec 2023
VQA4CIR: Boosting Composed Image Retrieval with Visual Question
  Answering
VQA4CIR: Boosting Composed Image Retrieval with Visual Question Answering
Chun-Mei Feng
Yang Bai
Yaoyu Zhang
Zhen Li
Salman Khan
Wangmeng Zuo
Xinxing Xu
Rick Siow Mong Goh
Yong-Jin Liu
273
10
0
19 Dec 2023
DriveMLM: Aligning Multi-Modal Large Language Models with Behavioral Planning States for Autonomous Driving
DriveMLM: Aligning Multi-Modal Large Language Models with Behavioral Planning States for Autonomous Driving
Wenhai Wang
Jiangwei Xie
ChuanYang Hu
Haoming Zou
Jianan Fan
Wenwen Tong
Yang Wen
Silei Wu
Hanming Deng
Zhiqi Li
364
217
0
14 Dec 2023
Chat-3D v2: Bridging 3D Scene and Large Language Models with Object
  Identifiers
Chat-3D v2: Bridging 3D Scene and Large Language Models with Object IdentifiersNeural Information Processing Systems (NeurIPS), 2023
Haifeng Huang
Zehan Wang
Rongjie Huang
Luping Liu
Xize Cheng
Yang Zhao
Tao Jin
Zhou Zhao
334
12
0
13 Dec 2023
Vista-LLaMA: Reducing Hallucination in Video Language Models via Equal Distance to Visual Tokens
Vista-LLaMA: Reducing Hallucination in Video Language Models via Equal Distance to Visual Tokens
Fan Ma
Xiaojie Jin
Heng Wang
Yuchen Xian
Jiashi Feng
Yi Yang
263
20
0
12 Dec 2023
Honeybee: Locality-enhanced Projector for Multimodal LLM
Honeybee: Locality-enhanced Projector for Multimodal LLM
Junbum Cha
Wooyoung Kang
Jonghwan Mun
Byungseok Roh
MLLM
402
199
0
11 Dec 2023
TMT-VIS: Taxonomy-aware Multi-dataset Joint Training for Video Instance
  Segmentation
TMT-VIS: Taxonomy-aware Multi-dataset Joint Training for Video Instance Segmentation
Rongkun Zheng
Lu Qi
Xi Chen
Yi Wang
Kun Wang
Yu Qiao
Hengshuang Zhao
364
2
0
11 Dec 2023
EgoPlan-Bench: Benchmarking Multimodal Large Language Models for
  Human-Level Planning
EgoPlan-Bench: Benchmarking Multimodal Large Language Models for Human-Level Planning
Yi Chen
Yuying Ge
Yixiao Ge
Mingyu Ding
Bohao Li
Rui Wang
Rui-Lan Xu
Ying Shan
Xihui Liu
LLMAGELMLRM
371
33
0
11 Dec 2023
Audio-Visual LLM for Video Understanding
Audio-Visual LLM for Video Understanding
Fangxun Shu
Lei Zhang
Hao Jiang
Cihang Xie
VLMMLLM
254
68
0
11 Dec 2023
LvBench: A Benchmark for Long-form Video Understanding with Versatile Multi-modal Question Answering
LvBench: A Benchmark for Long-form Video Understanding with Versatile Multi-modal Question Answering
Hongjie Zhang
Lu Dong
Yi Liu
Yifei Huang
Z. Ling
Yali Wang
Limin Wang
340
32
0
08 Dec 2023
GPT-4V with Emotion: A Zero-shot Benchmark for Generalized Emotion
  Recognition
GPT-4V with Emotion: A Zero-shot Benchmark for Generalized Emotion Recognition
Zheng Lian
Guoying Zhao
Haiyang Sun
Kang Chen
Zhuofan Wen
Hao Gu
Yinan Han
Jianhua Tao
268
82
0
07 Dec 2023
GPT4Point: A Unified Framework for Point-Language Understanding and Generation
GPT4Point: A Unified Framework for Point-Language Understanding and GenerationComputer Vision and Pattern Recognition (CVPR), 2023
Zhangyang Qi
Ye Fang
Zeyi Sun
Xiaoyang Wu
Tong Wu
Yuan Liu
Dahua Lin
Hengshuang Zhao
MLLM
479
36
0
05 Dec 2023
VaQuitA: Enhancing Alignment in LLM-Assisted Video Understanding
VaQuitA: Enhancing Alignment in LLM-Assisted Video Understanding
Yizhou Wang
Ruiyi Zhang
Haoliang Wang
Uttaran Bhattacharya
Yun Fu
Gang Wu
MLLM
243
19
0
04 Dec 2023
TimeChat: A Time-sensitive Multimodal Large Language Model for Long
  Video Understanding
TimeChat: A Time-sensitive Multimodal Large Language Model for Long Video UnderstandingComputer Vision and Pattern Recognition (CVPR), 2023
Shuhuai Ren
Linli Yao
Shicheng Li
Xu Sun
Lu Hou
VLMMLLM
372
364
0
04 Dec 2023
Towards Learning a Generalist Model for Embodied Navigation
Towards Learning a Generalist Model for Embodied NavigationComputer Vision and Pattern Recognition (CVPR), 2023
Duo Zheng
Shijia Huang
Lin Zhao
Yiwu Zhong
Liwei Wang
LM&Ro
643
118
0
04 Dec 2023
Zero-Shot Video Question Answering with Procedural Programs
Zero-Shot Video Question Answering with Procedural Programs
Rohan Choudhury
Koichiro Niinuma
Kishore Venkateshan
László A. Jeni
188
37
0
01 Dec 2023
Dolphins: Multimodal Language Model for Driving
Dolphins: Multimodal Language Model for DrivingEuropean Conference on Computer Vision (ECCV), 2023
Yingzi Ma
Yulong Cao
Jiachen Sun
Marco Pavone
Chaowei Xiao
MLLM
330
125
0
01 Dec 2023
ChatPose: Chatting about 3D Human Pose
ChatPose: Chatting about 3D Human PoseComputer Vision and Pattern Recognition (CVPR), 2023
Yao Feng
Jing Lin
Sai Kumar Dwivedi
Yu Sun
Priyanka Patel
Michael J. Black
3DH
298
68
0
30 Nov 2023
VTimeLLM: Empower LLM to Grasp Video Moments
VTimeLLM: Empower LLM to Grasp Video MomentsComputer Vision and Pattern Recognition (CVPR), 2023
Bin Huang
Xin Wang
Hong Chen
Zihan Song
Wenwu Zhu
MLLM
328
244
0
30 Nov 2023
VBench: Comprehensive Benchmark Suite for Video Generative Models
VBench: Comprehensive Benchmark Suite for Video Generative ModelsComputer Vision and Pattern Recognition (CVPR), 2023
Ziqi Huang
Yinan He
Jiashuo Yu
Fan Zhang
Chenyang Si
...
Xinyuan Chen
Limin Wang
Dahua Lin
Yu Qiao
Ziwei Liu
VGen
523
1,001
0
29 Nov 2023
MM-Narrator: Narrating Long-form Videos with Multimodal In-Context
  Learning
MM-Narrator: Narrating Long-form Videos with Multimodal In-Context LearningComputer Vision and Pattern Recognition (CVPR), 2023
Chaoyi Zhang
Kevin Qinghong Lin
Zhengyuan Yang
Jianfeng Wang
Linjie Li
Chung-Ching Lin
Zicheng Liu
Lijuan Wang
VGen
258
49
0
29 Nov 2023
VITATECS: A Diagnostic Dataset for Temporal Concept Understanding of
  Video-Language Models
VITATECS: A Diagnostic Dataset for Temporal Concept Understanding of Video-Language ModelsEuropean Conference on Computer Vision (ECCV), 2023
Shicheng Li
Lei Li
Shuhuai Ren
Yuanxin Liu
Yi Liu
Rundong Gao
Xu Sun
Lu Hou
227
49
0
29 Nov 2023
LLaMA-VID: An Image is Worth 2 Tokens in Large Language Models
LLaMA-VID: An Image is Worth 2 Tokens in Large Language ModelsEuropean Conference on Computer Vision (ECCV), 2023
Yanwei Li
Chengyao Wang
Jiaya Jia
VLMMLLM
333
480
0
28 Nov 2023
MVBench: A Comprehensive Multi-modal Video Understanding Benchmark
MVBench: A Comprehensive Multi-modal Video Understanding BenchmarkComputer Vision and Pattern Recognition (CVPR), 2023
Kunchang Li
Yali Wang
Yinan He
Yizhuo Li
Yi Wang
...
Jilan Xu
Guo Chen
Ping Luo
Limin Wang
Yu Qiao
VLMMLLM
673
872
0
28 Nov 2023
SEED-Bench-2: Benchmarking Multimodal Large Language Models
SEED-Bench-2: Benchmarking Multimodal Large Language Models
Bohao Li
Yuying Ge
Yixiao Ge
Guangzhi Wang
Rui Wang
Ruimao Zhang
Ying Shan
MLLMVLM
188
85
0
28 Nov 2023
AvatarGPT: All-in-One Framework for Motion Understanding, Planning,
  Generation and Beyond
AvatarGPT: All-in-One Framework for Motion Understanding, Planning, Generation and BeyondComputer Vision and Pattern Recognition (CVPR), 2023
Zixiang Zhou
Yu Wan
Baoyuan Wang
191
52
0
28 Nov 2023
Video-Bench: A Comprehensive Benchmark and Toolkit for Evaluating
  Video-based Large Language Models
Video-Bench: A Comprehensive Benchmark and Toolkit for Evaluating Video-based Large Language Models
Munan Ning
Bin Zhu
Yujia Xie
Bin Lin
Jiaxi Cui
Lu Yuan
Dongdong Chen
Li-ming Yuan
ELMMLLM
216
92
0
27 Nov 2023
ViT-Lens: Towards Omni-modal Representations
ViT-Lens: Towards Omni-modal RepresentationsComputer Vision and Pattern Recognition (CVPR), 2023
Weixian Lei
Yixiao Ge
Kun Yi
Jianfeng Zhang
Difei Gao
Dylan Sun
Yuying Ge
Ying Shan
Mike Zheng Shou
208
32
0
27 Nov 2023
EgoThink: Evaluating First-Person Perspective Thinking Capability of
  Vision-Language Models
EgoThink: Evaluating First-Person Perspective Thinking Capability of Vision-Language ModelsComputer Vision and Pattern Recognition (CVPR), 2023
Sijie Cheng
Zhicheng Guo
Jingwen Wu
Kechen Fang
Peng Li
Huaping Liu
Yang Liu
EgoVLRM
265
48
0
27 Nov 2023
Previous
123...1011129
Next
Page 10 of 12
Pageof 12