VidCoM: Fast Video Comprehension through Large Language Models with
Multimodal Tools

VidCoM: Fast Video Comprehension through Large Language Models with Multimodal Tools

16 October 2023

Lei Hou

Papers citing "VidCoM: Fast Video Comprehension through Large Language Models with Multimodal Tools"

4 / 4 papers shown

Title
Learning Grounded Vision-Language Representation for Versatile Understanding in Untrimmed Videos Teng Wang Jinrui Zhang Feng Zheng Wenhao Jiang Ran Cheng Ping Luo VLM 26 10 0 11 Mar 2023
Language Models with Image Descriptors are Strong Few-Shot Video-Language Learners Zhenhailong Wang Manling Li Ruochen Xu Luowei Zhou Jie Lei ... Chenguang Zhu Derek Hoiem Shih-Fu Chang Mohit Bansal Heng Ji MLLM VLM 167 134 0 22 May 2022
Training language models to follow instructions with human feedback Long Ouyang Jeff Wu Xu Jiang Diogo Almeida Carroll L. Wainwright ... Amanda Askell Peter Welinder Paul Christiano Jan Leike Ryan J. Lowe OSLM ALM 301 11,730 0 04 Mar 2022
Bridge to Answer: Structure-aware Graph Interaction Network for Video Question Answering Jungin Park Jiyoung Lee K. Sohn 123 99 0 29 Apr 2021