A Thousand Words Are Worth More Than a Picture: Natural Language-Centric
Outside-Knowledge Visual Question Answering

A Thousand Words Are Worth More Than a Picture: Natural Language-Centric Outside-Knowledge Visual Question Answering

14 January 2022

Govind Thattai

Aishwarya N. Reganti

Premkumar Natarajan

ArXiv (abs)PDF HTML

Papers citing "A Thousand Words Are Worth More Than a Picture: Natural Language-Centric Outside-Knowledge Visual Question Answering"

14 / 14 papers shown

Title
From Generator to Embedder: Harnessing Innate Abilities of Multimodal LLMs via Building Zero-Shot Discriminative Embedding Model Yeong-Joon Ju Seong-Whan Lee 156 4 0 01 Aug 2025
Fine-Grained Knowledge Structuring and Retrieval for Visual Question Answering Zhengxuan Zhang Yin Wu Yuyu Luo Nan Tang 286 0 0 28 Feb 2025
FilterRAG: Zero-Shot Informed Retrieval-Augmented Generation to Mitigate Hallucinations in VQA S M Sarwar 350 2 0 25 Feb 2025
Retrieval-Enhanced Machine Learning: Synthesis and Opportunities To Eun Kim Alireza Salemi Andrew Drozdov Fernando Diaz Hamed Zamani 319 10 0 17 Jul 2024
Generative Multi-Modal Knowledge Retrieval with Large Language ModelsAAAI Conference on Artificial Intelligence (AAAI), 2024 Xinwei Long Jiali Zeng Fandong Meng Zhiyuan Ma Kaiyan Zhang Bowen Zhou Jie Zhou 186 32 0 16 Jan 2024
Open-Set Knowledge-Based Visual Question Answering with Inference Paths Jingru Gan Xinzhe Han Shuhui Wang Qingming Huang 125 1 0 12 Oct 2023
End-to-end Knowledge Retrieval with Multi-modal QueriesAnnual Meeting of the Association for Computational Linguistics (ACL), 2023 Man Luo Zhiyuan Fang Tejas Gokhale Yezhou Yang Chitta Baral VLM 160 30 0 01 Jun 2023
Graph Neural Networks in Vision-Language Image Understanding: A SurveyThe Visual Computer (TVC), 2023 Henry Senior Greg Slabaugh Shanxin Yuan Luca Rossi GNN 252 30 0 07 Mar 2023
Towards Reasoning-Aware Explainable VQA Rakesh Vaideeswaran Feng Gao Abhinav Mathur Govind Thattai LRM 169 4 0 09 Nov 2022
Entity-Focused Dense Passage Retrieval for Outside-Knowledge Visual Question AnsweringConference on Empirical Methods in Natural Language Processing (EMNLP), 2022 Jialin Wu Raymond J. Mooney RALM 220 13 0 18 Oct 2022
Retrieval Augmented Visual Question Answering with Outside KnowledgeConference on Empirical Methods in Natural Language Processing (EMNLP), 2022 Weizhe Lin Bill Byrne RALM 201 106 0 07 Oct 2022
LaKo: Knowledge-driven Visual Question Answering via Late Knowledge-to-Text Injection Zhuo Chen Yufen Huang Jiaoyan Chen Yuxia Geng Yin Fang Jeff Z. Pan Ningyu Zhang Wen Zhang 181 47 0 26 Jul 2022
CRIC: A VQA Dataset for Compositional Reasoning on Vision and CommonsenseIEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2019 Difei Gao Ruiping Wang Shiguang Shan Xilin Chen CoGe LRM 216 36 0 08 Aug 2019
Billion-scale similarity search with GPUsIEEE Transactions on Big Data (TBD), 2017 Jeff Johnson Matthijs Douze Edouard Grave 833 4,404 0 28 Feb 2017