Understanding Long Videos with Multimodal Language Models

Understanding Long Videos with Multimodal Language Models

25 March 2024

Kanchana Ranasinghe

Kumara Kahatapitiya

Michael S. Ryoo

Papers citing "Understanding Long Videos with Multimodal Language Models"

12 / 12 papers shown

Title
Memory Consolidation Enables Long-Context Video Understanding Ivana Balavzević Yuge Shi Pinelopi Papalampidi Rahma Chaabouni Skanda Koppula Olivier J. Hénaff 84 22 0 08 Feb 2024
Understanding Video Transformers via Universal Concept Discovery M. Kowal Achal Dave Rares Ambrus Adrien Gaidon Konstantinos G. Derpanis P. Tokmakov ViT 19 2 0 19 Jan 2024
A Simple LLM Framework for Long-Range Video Question-Answering Ce Zhang Taixi Lu Md. Mohaiminul Islam Ziyang Wang Shoubin Yu Mohit Bansal Gedas Bertasius 95 80 0 28 Dec 2023
Language as the Medium: Multimodal Video Classification through text only Laura Hanu A. Vero James Thewlis 28 3 0 19 Sep 2023
mPLUG-Owl: Modularization Empowers Large Language Models with Multimodality Qinghao Ye Haiyang Xu Guohai Xu Jiabo Ye Ming Yan ... Junfeng Tian Qiang Qi Ji Zhang Feiyan Huang Jingren Zhou VLM MLLM 198 883 0 27 Apr 2023
MAtch, eXpand and Improve: Unsupervised Finetuning for Zero-Shot Action Recognition with Language Knowledge Wei Lin Leonid Karlinsky Nina Shvetsova Horst Possegger Mateusz Koziñski Rameswar Panda Rogerio Feris Hilde Kuehne Horst Bischof VLM 92 38 0 15 Mar 2023
BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models Junnan Li Dongxu Li Silvio Savarese Steven C. H. Hoi VLM MLLM 244 4,186 0 30 Jan 2023
HiTeA: Hierarchical Temporal-Aware Video-Language Pre-training Qinghao Ye Guohai Xu Ming Yan Haiyang Xu Qi Qian Ji Zhang Fei Huang VLM AI4TS 152 69 0 30 Dec 2022
Real-World Robot Learning with Masked Visual Pre-training Ilija Radosavovic Tete Xiao Stephen James Pieter Abbeel Jitendra Malik Trevor Darrell SSL 141 238 0 06 Oct 2022
Grounding Language with Visual Affordances over Unstructured Data Oier Mees Jessica Borja-Diaz Wolfram Burgard LM&Ro 111 106 0 04 Oct 2022
Video Graph Transformer for Video Question Answering Junbin Xiao Pan Zhou Tat-Seng Chua Shuicheng Yan ViT 131 73 0 12 Jul 2022
Watch and Match: Supercharging Imitation with Regularized Optimal Transport Siddhant Haldar Vaibhav Mathur Denis Yarats Lerrel Pinto 38 49 0 30 Jun 2022