ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2305.06355
  4. Cited By
VideoChat: Chat-Centric Video Understanding
v1v2 (latest)

VideoChat: Chat-Centric Video Understanding

10 May 2023
Kunchang Li
Yinan He
Yi Wang
Yizhuo Li
Wen Wang
Ping Luo
Yali Wang
Limin Wang
Yu Qiao
    MLLM
ArXiv (abs)PDFHTMLHuggingFace (3 upvotes)Github (3246★)

Papers citing "VideoChat: Chat-Centric Video Understanding"

13 / 563 papers shown
MIMIC-IT: Multi-Modal In-Context Instruction Tuning
MIMIC-IT: Multi-Modal In-Context Instruction Tuning
Yue Liu
Yuanhan Zhang
Liangyu Chen
Jinghao Wang
Fanyi Pu
Jingkang Yang
Xuefei Liu
Ziwei Liu
MLLMVLM
304
291
0
08 Jun 2023
Youku-mPLUG: A 10 Million Large-scale Chinese Video-Language Dataset for
  Pre-training and Benchmarks
Youku-mPLUG: A 10 Million Large-scale Chinese Video-Language Dataset for Pre-training and Benchmarks
Haiyang Xu
Qinghao Ye
Xuan-Wei Wu
Mingshi Yan
Yuan Miao
...
Qingfang Qian
Maofei Que
Ji Zhang
Xiaoyan Zeng
Feiyan Huang
VLMMLLM
184
34
0
07 Jun 2023
Video-LLaMA: An Instruction-tuned Audio-Visual Language Model for Video
  Understanding
Video-LLaMA: An Instruction-tuned Audio-Visual Language Model for Video UnderstandingConference on Empirical Methods in Natural Language Processing (EMNLP), 2023
Hang Zhang
Xin Li
Lidong Bing
MLLM
578
1,508
0
05 Jun 2023
OpenVIS: Open-vocabulary Video Instance Segmentation
OpenVIS: Open-vocabulary Video Instance SegmentationAAAI Conference on Artificial Intelligence (AAAI), 2023
Pinxue Guo
Tony Huang
Peiyang He
Xuefeng Liu
Tianjun Xiao
Zhaoyu Chen
Wenqiang Zhang
VLM
233
24
0
26 May 2023
EmbodiedGPT: Vision-Language Pre-Training via Embodied Chain of Thought
EmbodiedGPT: Vision-Language Pre-Training via Embodied Chain of ThoughtNeural Information Processing Systems (NeurIPS), 2023
Yao Mu
Qinglong Zhang
Mengkang Hu
Wen Wang
Mingyu Ding
Jun Jin
Sijin Yu
Jifeng Dai
Yu Qiao
Ping Luo
LM&RoLRM
416
353
0
24 May 2023
ChatFace: Chat-Guided Real Face Editing via Diffusion Latent Space
  Manipulation
ChatFace: Chat-Guided Real Face Editing via Diffusion Latent Space Manipulation
Dongxu Yue
Qin Guo
Munan Ning
Jiaxi Cui
Yuesheng Zhu
Liuliang Yuan
DiffM
318
15
0
24 May 2023
TVTSv2: Learning Out-of-the-box Spatiotemporal Visual Representations at
  Scale
TVTSv2: Learning Out-of-the-box Spatiotemporal Visual Representations at Scale
Ziyun Zeng
Yixiao Ge
Zhan Tong
Xihui Liu
Shutao Xia
Ying Shan
285
13
0
23 May 2023
VideoLLM: Modeling Video Sequence with Large Language Models
VideoLLM: Modeling Video Sequence with Large Language Models
Guo Chen
Yin-Dong Zheng
Jiahao Wang
Jilan Xu
Yifei Huang
...
Yi Wang
Yali Wang
Yu Qiao
Tong Lu
Limin Wang
MLLM
264
114
0
22 May 2023
VisionLLM: Large Language Model is also an Open-Ended Decoder for
  Vision-Centric Tasks
VisionLLM: Large Language Model is also an Open-Ended Decoder for Vision-Centric TasksNeural Information Processing Systems (NeurIPS), 2023
Wen Wang
Zhe Chen
Xiaokang Chen
Jiannan Wu
Xizhou Zhu
...
Ping Luo
Tong Lu
Jie Zhou
Yu Qiao
Jifeng Dai
MLLMVLM
317
622
0
18 May 2023
InternGPT: Solving Vision-Centric Tasks by Interacting with ChatGPT
  Beyond Language
InternGPT: Solving Vision-Centric Tasks by Interacting with ChatGPT Beyond Language
Zhaoyang Liu
Yinan He
Wenhai Wang
Weiyun Wang
Yi Wang
...
Yali Wang
Limin Wang
Ping Luo
Jifeng Dai
Yu Qiao
LRMMLLM
399
107
0
09 May 2023
Otter: A Multi-Modal Model with In-Context Instruction Tuning
Otter: A Multi-Modal Model with In-Context Instruction TuningIEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2023
Yue Liu
Yuanhan Zhang
Liangyu Chen
Jinghao Wang
Fanyi Pu
Joshua Adrian Cahyono
Jingkang Yang
Yu Qiao
MLLM
530
627
0
05 May 2023
MiniGPT-4: Enhancing Vision-Language Understanding with Advanced Large
  Language Models
MiniGPT-4: Enhancing Vision-Language Understanding with Advanced Large Language ModelsInternational Conference on Learning Representations (ICLR), 2023
Deyao Zhu
Jun Chen
Xiaoqian Shen
Xiang Li
Mohamed Elhoseiny
VLMMLLM
473
2,742
0
20 Apr 2023
Accountable Textual-Visual Chat Learns to Reject Human Instructions in
  Image Re-creation
Accountable Textual-Visual Chat Learns to Reject Human Instructions in Image Re-creation
Zhiwei Zhang
Yuliang Liu
MLLM
374
0
0
10 Mar 2023
Previous
123...101112
Page 12 of 12
Pageof 12