Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1902.09506
Cited By
GQA: A New Dataset for Real-World Visual Reasoning and Compositional Question Answering
25 February 2019
Drew A. Hudson
Christopher D. Manning
CoGe
NAI
Re-assign community
ArXiv
PDF
HTML
Papers citing
"GQA: A New Dataset for Real-World Visual Reasoning and Compositional Question Answering"
20 / 20 papers shown
Title
Conformal Prediction and MLLM aided Uncertainty Quantification in Scene Graph Generation
Sayak Nag
Udita Ghosh
Sarosij Bose
Calvin-Khang Ta
Jiachen Li
A. Roy-Chowdhury
61
0
0
18 Mar 2025
Towards Efficient and Robust VQA-NLE Data Generation with Large Vision-Language Models
Patrick Amadeus Irawan
Genta Indra Winata
Samuel Cahyawijaya
Ayu Purwarianti
25
0
0
23 Sep 2024
AM-RADIO: Agglomerative Vision Foundation Model -- Reduce All Domains Into One
Michael Ranzinger
Greg Heinrich
Jan Kautz
Pavlo Molchanov
VLM
31
42
0
10 Dec 2023
Localized Questions in Medical Visual Question Answering
Sergio Tascon-Morales
Pablo Márquez-Neila
Raphael Sznitman
17
8
0
03 Jul 2023
Joint Adaptive Representations for Image-Language Learning
A. Piergiovanni
A. Angelova
VLM
21
0
0
31 May 2023
Effective End-to-End Vision Language Pretraining with Semantic Visual Loss
Xiaofeng Yang
Fayao Liu
Guosheng Lin
VLM
19
7
0
18 Jan 2023
ERNIE-UniX2: A Unified Cross-lingual Cross-modal Framework for Understanding and Generation
Bin Shan
Yaqian Han
Weichong Yin
Shuohuan Wang
Yu Sun
Hao Tian
Hua-Hong Wu
Haifeng Wang
MLLM
VLM
11
7
0
09 Nov 2022
Consistency-preserving Visual Question Answering in Medical Imaging
Sergio Tascon-Morales
Pablo Márquez-Neila
Raphael Sznitman
MedIm
17
12
0
27 Jun 2022
Answer-Me: Multi-Task Open-Vocabulary Visual Question Answering
A. Piergiovanni
Wei Li
Weicheng Kuo
M. Saffar
Fred Bertsch
A. Angelova
17
16
0
02 May 2022
When Did It Happen? Duration-informed Temporal Localization of Narrated Actions in Vlogs
Oana Ignat
Santiago Castro
Yuhang Zhou
Jiajun Bao
Dandan Shan
Rada Mihalcea
18
3
0
16 Feb 2022
ICDAR 2021 Competition on Document VisualQuestion Answering
Rubèn Pérez Tito
Minesh Mathew
C. V. Jawahar
Ernest Valveny
Dimosthenis Karatzas
30
23
0
10 Nov 2021
InfographicVQA
Minesh Mathew
Viraj Bagal
Rubèn Pérez Tito
Dimosthenis Karatzas
Ernest Valveny
C. V. Jawahar
16
202
0
26 Apr 2021
Causal Attention for Vision-Language Tasks
Xu Yang
Hanwang Zhang
Guojun Qi
Jianfei Cai
CML
23
148
0
05 Mar 2021
Video2Commonsense: Generating Commonsense Descriptions to Enrich Video Captioning
Zhiyuan Fang
Tejas Gokhale
Pratyay Banerjee
Chitta Baral
Yezhou Yang
10
60
0
11 Mar 2020
TAB-VCR: Tags and Attributes based Visual Commonsense Reasoning Baselines
Jingxiang Lin
Unnat Jain
A. Schwing
LRM
ReLM
26
9
0
31 Oct 2019
LXMERT: Learning Cross-Modality Encoder Representations from Transformers
Hao Hao Tan
Mohit Bansal
VLM
MLLM
52
2,447
0
20 Aug 2019
Fusion of Detected Objects in Text for Visual Question Answering
Chris Alberti
Jeffrey Ling
Michael Collins
David Reitter
6
173
0
14 Aug 2019
An Empirical Study on Leveraging Scene Graphs for Visual Question Answering
Cheng Zhang
Wei-Lun Chao
D. Xuan
21
50
0
28 Jul 2019
Learning To Follow Directions in Street View
Karl Moritz Hermann
Mateusz Malinowski
Piotr Wojciech Mirowski
Andras Banki-Horvath
Keith Anderson
R. Hadsell
SSL
13
66
0
01 Mar 2019
Multimodal Compact Bilinear Pooling for Visual Question Answering and Visual Grounding
Akira Fukui
Dong Huk Park
Daylen Yang
Anna Rohrbach
Trevor Darrell
Marcus Rohrbach
144
1,464
0
06 Jun 2016
1