Spacewalk-18: A Benchmark for Multimodal and Long-form Procedural Video Understanding in Novel Domains

30 November 2023

Papers citing "Spacewalk-18: A Benchmark for Multimodal and Long-form Procedural Video Understanding in Novel Domains"

6 / 6 papers shown

Title
Vamos: Versatile Action Models for Video Understanding Shijie Wang Qi Zhao Minh Quan Do Nakul Agarwal Kwonjoon Lee Chen Sun 21 19 0 22 Nov 2023
Perception Test: A Diagnostic Benchmark for Multimodal Video Models Viorica Puatruaucean Lucas Smaira Ankush Gupta Adrià Recasens Continente L. Markeeva ... Y. Aytar Simon Osindero Dima Damen Andrew Zisserman João Carreira VLM 110 138 0 23 May 2023
BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models Junnan Li Dongxu Li Silvio Savarese Steven C. H. Hoi VLM MLLM 244 4,186 0 30 Jan 2023
Temporal Action Segmentation: An Analysis of Modern Techniques Guodong Ding Fadime Sener Angela Yao 27 73 0 19 Oct 2022
Ego4D: Around the World in 3,000 Hours of Egocentric Video Kristen Grauman Andrew Westbury Eugene Byrne Zachary Chavis Antonino Furnari ... Mike Zheng Shou Antonio Torralba Lorenzo Torresani Mingfei Yan Jitendra Malik EgoV 212 682 0 13 Oct 2021
VideoCLIP: Contrastive Pre-training for Zero-shot Video-Text Understanding Hu Xu Gargi Ghosh Po-Yao (Bernie) Huang Dmytro Okhonko Armen Aghajanyan Florian Metze Luke Zettlemoyer Florian Metze Luke Zettlemoyer Christoph Feichtenhofer CLIP VLM 242 554 0 28 Sep 2021