Self-Adaptive Sampling for Efficient Video Question-Answering on Image--Text Models

9 July 2023

Papers citing "Self-Adaptive Sampling for Efficient Video Question-Answering on Image--Text Models"

3 / 3 papers shown

Title
VideoAgent: Long-form Video Understanding with Large Language Model as Agent Xiaohan Wang Yuhui Zhang Orr Zohar Serena Yeung-Levy VLM 103 83 0 15 Mar 2024
BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation Junnan Li Dongxu Li Caiming Xiong S. Hoi MLLM BDL VLM CLIP 382 4,010 0 28 Jan 2022
Scaling Up Visual and Vision-Language Representation Learning With Noisy Text Supervision Chao Jia Yinfei Yang Ye Xia Yi-Ting Chen Zarana Parekh Hieu H. Pham Quoc V. Le Yun-hsuan Sung Zhen Li Tom Duerig VLM CLIP 293 3,683 0 11 Feb 2021