$Is a Video worth $n\times n$ Images? A Highly Efficient Approach to Transformer-based Video Question Answering$

Is a Video worth $n\times n$ Images? A Highly Efficient Approach to Transformer-based Video Question Answering

16 May 2023

Papers citing "Is a Video worth $n\times n$ Images? A Highly Efficient Approach to Transformer-based Video Question Answering"

3 / 3 papers shown

Title
A CLIP-Hitchhiker's Guide to Long Video Retrieval Max Bain Arsha Nagrani Gül Varol Andrew Zisserman CLIP 113 60 0 17 May 2022
VideoCLIP: Contrastive Pre-training for Zero-shot Video-Text Understanding Hu Xu Gargi Ghosh Po-Yao (Bernie) Huang Dmytro Okhonko Armen Aghajanyan Florian Metze Luke Zettlemoyer Florian Metze Luke Zettlemoyer Christoph Feichtenhofer CLIP VLM 245 554 0 28 Sep 2021
CLIP4Clip: An Empirical Study of CLIP for End to End Video Clip Retrieval Huaishao Luo Lei Ji Ming Zhong Yang Chen Wen Lei Nan Duan Tianrui Li CLIP VLM 303 771 0 18 Apr 2021