Fully Authentic Visual Question Answering Dataset from Online
Communities

Fully Authentic Visual Question Answering Dataset from Online Communities

27 November 2023

Yunsheng Li

Lu Yuan

Papers citing "Fully Authentic Visual Question Answering Dataset from Online Communities"

13 / 13 papers shown

Title
Towards Understanding the Use of MLLM-Enabled Applications for Visual Interpretation by Blind and Low Vision People Ricardo E Gonzalez Penuela Ruiying Hu Sharon Lin Tanisha Shende Shiri Azenkot 44 1 0 07 Mar 2025
Accounting for Focus Ambiguity in Visual Questions Chongyan Chen Yu-Yun Tseng Zhuoheng Li Anush Venkatesh Danna Gurari 36 0 0 04 Jan 2025
A Survey of Multimodal Large Language Model from A Data-centric Perspective Tianyi Bai Hao Liang Binwang Wan Yanran Xu Xi Li ... Ping-Chia Huang Jiulong Shan Conghui He Binhang Yuan Wentao Zhang 47 36 0 26 May 2024
Gemini Pro Defeated by GPT-4V: Evidence from Education Gyeong-Geon Lee Ehsan Latif Lehong Shi Xiaoming Zhai 26 21 0 27 Dec 2023
An Evaluation of GPT-4V and Gemini in Online VQA Mengchen Liu Chongyan Chen Danna Gurari MLLM 53 7 0 17 Dec 2023
Context-VQA: Towards Context-Aware and Purposeful Visual Question Answering N. Naik Christopher Potts Elisa Kreiss 17 3 0 28 Jul 2023
Breaking Common Sense: WHOOPS! A Vision-and-Language Benchmark of Synthetic and Compositional Images Nitzan Bitton-Guetta Yonatan Bitton Jack Hessel Ludwig Schmidt Yuval Elovici Gabriel Stanovsky Roy Schwartz VLM 121 65 0 13 Mar 2023
Open-domain Visual Entity Recognition: Towards Recognizing Millions of Wikipedia Entities Hexiang Hu Yi Luan Yang Chen Urvashi Khandelwal Mandar Joshi Kenton Lee Kristina Toutanova Ming-Wei Chang VLM 43 55 0 22 Feb 2023
BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models Junnan Li Dongxu Li Silvio Savarese Steven C. H. Hoi VLM MLLM 259 4,223 0 30 Jan 2023
Multitask Prompted Training Enables Zero-Shot Task Generalization Victor Sanh Albert Webson Colin Raffel Stephen H. Bach Lintang Sutawika ... T. Bers Stella Biderman Leo Gao Thomas Wolf Alexander M. Rush LRM 205 1,654 0 15 Oct 2021
Conceptual 12M: Pushing Web-Scale Image-Text Pre-Training To Recognize Long-Tail Visual Concepts Soravit Changpinyo P. Sharma Nan Ding Radu Soricut VLM 273 1,077 0 17 Feb 2021
Did Aristotle Use a Laptop? A Question Answering Benchmark with Implicit Reasoning Strategies Mor Geva Daniel Khashabi Elad Segal Tushar Khot Dan Roth Jonathan Berant RALM 245 671 0 06 Jan 2021
The Pile: An 800GB Dataset of Diverse Text for Language Modeling Leo Gao Stella Biderman Sid Black Laurence Golding Travis Hoppe ... Horace He Anish Thite Noa Nabeshima Shawn Presser Connor Leahy AIMat 245 1,977 0 31 Dec 2020