Connecting Vision and Language with Video Localized Narratives

22 February 2023

Papers citing "Connecting Vision and Language with Video Localized Narratives"

7 / 7 papers shown

Title
Quilt-LLaVA: Visual Instruction Tuning by Extracting Localized Narratives from Open-Source Histopathology Videos M. S. Seyfioglu Wisdom O. Ikezogwo Fatemeh Ghezloo Ranjay Krishna Linda G. Shapiro 22 31 0 07 Dec 2023
Quilt-1M: One Million Image-Text Pairs for Histopathology Wisdom O. Ikezogwo M. S. Seyfioglu Fatemeh Ghezloo Dylan Stefan Chan Geva Fatwir Sheikh Mohammed Pavan Kumar Anand Ranjay Krishna Linda G. Shapiro CLIP VLM 125 101 0 20 Jun 2023
Crossmodal-3600: A Massively Multilingual Multimodal Evaluation Dataset Ashish V. Thapliyal Jordi Pont-Tuset Xi Chen Radu Soricut VGen 67 71 0 25 May 2022
Ego4D: Around the World in 3,000 Hours of Egocentric Video Kristen Grauman Andrew Westbury Eugene Byrne Zachary Chavis Antonino Furnari ... Mike Zheng Shou Antonio Torralba Lorenzo Torresani Mingfei Yan Jitendra Malik EgoV 224 1,017 0 13 Oct 2021
Panoptic Narrative Grounding Cristina González Nicolás Ayobi Isabela Hernández José Hernández Jordi Pont-Tuset Pablo Arbeláez 74 22 0 10 Sep 2021
Zero-Shot Text-to-Image Generation Aditya A. Ramesh Mikhail Pavlov Gabriel Goh Scott Gray Chelsea Voss Alec Radford Mark Chen Ilya Sutskever VLM 253 4,735 0 24 Feb 2021
ImageNet Large Scale Visual Recognition Challenge Olga Russakovsky Jia Deng Hao Su J. Krause S. Satheesh ... A. Karpathy A. Khosla Michael S. Bernstein Alexander C. Berg Li Fei-Fei VLM ObjD 279 39,083 0 01 Sep 2014