v1v2 (latest)

VideoChat: Chat-Centric Video Understanding

10 May 2023

Yi Wang

Ping Luo

Yu Qiao

ArXiv (abs)PDF HTML HuggingFace (3 upvotes)Github (3246★)

Papers citing "VideoChat: Chat-Centric Video Understanding"

13 / 563 papers shown

MIMIC-IT: Multi-Modal In-Context Instruction Tuning

Ziwei Liu

304

291

08 Jun 2023

Youku-mPLUG: A 10 Million Large-scale Chinese Video-Language Dataset for Pre-training and Benchmarks

...

Ji Zhang

184

07 Jun 2023

Video-LLaMA: An Instruction-tuned Audio-Visual Language Model for Video UnderstandingConference on Empirical Methods in Natural Language Processing (EMNLP), 2023

Hang Zhang

Xin Li

Lidong Bing

MLLM

578

1,508

05 Jun 2023

OpenVIS: Open-vocabulary Video Instance SegmentationAAAI Conference on Artificial Intelligence (AAAI), 2023

Tianjun Xiao

Zhaoyu Chen

Wenqiang Zhang

VLM

233

26 May 2023

EmbodiedGPT: Vision-Language Pre-Training via Embodied Chain of ThoughtNeural Information Processing Systems (NeurIPS), 2023

Mingyu Ding

Yu Qiao

Ping Luo

LM&Ro LRM

416

353

24 May 2023

ChatFace: Chat-Guided Real Face Editing via Diffusion Latent Space Manipulation

318

24 May 2023

TVTSv2: Learning Out-of-the-box Spatiotemporal Visual Representations at Scale

Ying Shan

285

23 May 2023

VideoLLM: Modeling Video Sequence with Large Language Models

Yifei Huang

...

Yi Wang

Yu Qiao

264

114

22 May 2023

VisionLLM: Large Language Model is also an Open-Ended Decoder for Vision-Centric TasksNeural Information Processing Systems (NeurIPS), 2023

...

Yu Qiao

317

622

18 May 2023

InternGPT: Solving Vision-Centric Tasks by Interacting with ChatGPT Beyond Language

Yi Wang

...

Ping Luo

Yu Qiao

399

107

09 May 2023

Otter: A Multi-Modal Model with In-Context Instruction TuningIEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2023

Joshua Adrian Cahyono

Jingkang Yang

Yu Qiao

MLLM

530

627

05 May 2023

MiniGPT-4: Enhancing Vision-Language Understanding with Advanced Large Language ModelsInternational Conference on Learning Representations (ICLR), 2023

473

2,742

20 Apr 2023

Accountable Textual-Visual Chat Learns to Reject Human Instructions in Image Re-creation

Zhiwei Zhang

Yuliang Liu

MLLM

374

10 Mar 2023