ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1612.00837
  4. Cited By
Making the V in VQA Matter: Elevating the Role of Image Understanding in
  Visual Question Answering
v1v2v3 (latest)

Making the V in VQA Matter: Elevating the Role of Image Understanding in Visual Question Answering

2 December 2016
Yash Goyal
Tejas Khot
D. Summers-Stay
Dhruv Batra
Devi Parikh
    CoGe
ArXiv (abs)PDFHTML

Papers citing "Making the V in VQA Matter: Elevating the Role of Image Understanding in Visual Question Answering"

50 / 2,273 papers shown
Title
Deep Modular Co-Attention Networks for Visual Question Answering
Deep Modular Co-Attention Networks for Visual Question AnsweringComputer Vision and Pattern Recognition (CVPR), 2019
Zhou Yu
Jun Yu
Yuhao Cui
Dacheng Tao
Q. Tian
281
915
0
25 Jun 2019
RUBi: Reducing Unimodal Biases in Visual Question Answering
RUBi: Reducing Unimodal Biases in Visual Question AnsweringNeural Information Processing Systems (NeurIPS), 2019
Rémi Cadène
Corentin Dancette
H. Ben-younes
Matthieu Cord
Devi Parikh
CML
266
401
0
24 Jun 2019
Investigating Biases in Textual Entailment Datasets
Investigating Biases in Textual Entailment Datasets
Shawn Tan
Songlin Yang
Chin-Wei Huang
Aaron Courville
105
8
0
23 Jun 2019
Adversarial Regularization for Visual Question Answering: Strengths,
  Shortcomings, and Side Effects
Adversarial Regularization for Visual Question Answering: Strengths, Shortcomings, and Side Effects
Gabriel Grand
Yonatan Belinkov
172
70
0
20 Jun 2019
Improving Visual Question Answering by Referring to Generated Paragraph
  Captions
Improving Visual Question Answering by Referring to Generated Paragraph CaptionsAnnual Meeting of the Association for Computational Linguistics (ACL), 2019
Hyounghun Kim
Joey Tianyi Zhou
CoGe
106
21
0
14 Jun 2019
Mimic and Fool: A Task Agnostic Adversarial Attack
Mimic and Fool: A Task Agnostic Adversarial AttackIEEE Transactions on Neural Networks and Learning Systems (TNNLS), 2019
Akshay Chaturvedi
Utpal Garain
AAML
110
29
0
11 Jun 2019
ActivityNet-QA: A Dataset for Understanding Complex Web Videos via
  Question Answering
ActivityNet-QA: A Dataset for Understanding Complex Web Videos via Question AnsweringAAAI Conference on Artificial Intelligence (AAAI), 2019
Zhou Yu
D. Xu
Jun-chen Yu
Ting Yu
Zhou Zhao
Yueting Zhuang
Dacheng Tao
283
602
0
06 Jun 2019
Generating Question Relevant Captions to Aid Visual Question Answering
Generating Question Relevant Captions to Aid Visual Question AnsweringAnnual Meeting of the Association for Computational Linguistics (ACL), 2019
Jialin Wu
Zeyuan Hu
Raymond J. Mooney
220
45
0
03 Jun 2019
OK-VQA: A Visual Question Answering Benchmark Requiring External
  Knowledge
OK-VQA: A Visual Question Answering Benchmark Requiring External KnowledgeComputer Vision and Pattern Recognition (CVPR), 2019
Kenneth Marino
Mohammad Rastegari
Ali Farhadi
Roozbeh Mottaghi
550
1,347
0
31 May 2019
Scene Text Visual Question Answering
Scene Text Visual Question AnsweringIEEE International Conference on Computer Vision (ICCV), 2019
Ali Furkan Biten
Rubèn Pérez Tito
Andrés Mafla
Lluís Gómez
Marçal Rusiñol
Ernest Valveny
C. V. Jawahar
Dimosthenis Karatzas
404
439
0
31 May 2019
What Makes Training Multi-Modal Classification Networks Hard?
What Makes Training Multi-Modal Classification Networks Hard?Computer Vision and Pattern Recognition (CVPR), 2019
Weiyao Wang
Du Tran
Matt Feiszli
522
556
0
29 May 2019
Learning Dynamics of Attention: Human Prior for Interpretable Machine
  Reasoning
Learning Dynamics of Attention: Human Prior for Interpretable Machine ReasoningNeural Information Processing Systems (NeurIPS), 2019
Wonjae Kim
Yoonho Lee
191
6
0
28 May 2019
Structure Learning for Neural Module Networks
Structure Learning for Neural Module NetworksConference on Empirical Methods in Natural Language Processing (EMNLP), 2019
Vardaan Pahuja
Jie Fu
Sarath Chandar
C. Pal
119
8
0
27 May 2019
Deep Reason: A Strong Baseline for Real-World Visual Reasoning
Deep Reason: A Strong Baseline for Real-World Visual Reasoning
Chenfei Wu
Yanzhao Zhou
Gen Li
Nan Duan
Duyu Tang
Xiaojie Wang
LRMNAIReLM
182
2
0
24 May 2019
Self-Critical Reasoning for Robust Visual Question Answering
Self-Critical Reasoning for Robust Visual Question AnsweringNeural Information Processing Systems (NeurIPS), 2019
Jialin Wu
Raymond J. Mooney
OODNAI
213
170
0
24 May 2019
AttentionRNN: A Structured Spatial Attention Mechanism
AttentionRNN: A Structured Spatial Attention MechanismIEEE International Conference on Computer Vision (ICCV), 2019
Siddhesh Khandelwal
Leonid Sigal
173
3
0
22 May 2019
Multimodal Transformer with Multi-View Visual Representation for Image
  Captioning
Multimodal Transformer with Multi-View Visual Representation for Image Captioning
Jun-chen Yu
Jing Li
Zhou Yu
Qingming Huang
ViT
180
424
0
20 May 2019
SplitNet: Sim2Sim and Task2Task Transfer for Embodied Visual Navigation
SplitNet: Sim2Sim and Task2Task Transfer for Embodied Visual NavigationIEEE International Conference on Computer Vision (ICCV), 2019
Daniel Gordon
Abhishek Kadian
Devi Parikh
Judy Hoffman
Dhruv Batra
242
79
0
18 May 2019
Aligning Visual Regions and Textual Concepts for Semantic-Grounded Image
  Representations
Aligning Visual Regions and Textual Concepts for Semantic-Grounded Image RepresentationsNeural Information Processing Systems (NeurIPS), 2019
Fenglin Liu
Yuanxin Liu
Xuancheng Ren
Xiaodong He
Xu Sun
VLM
154
90
0
15 May 2019
Misleading Failures of Partial-input Baselines
Misleading Failures of Partial-input Baselines
Shi Feng
Eric Wallace
Jordan L. Boyd-Graber
199
0
0
14 May 2019
Quantifying and Alleviating the Language Prior Problem in Visual
  Question Answering
Quantifying and Alleviating the Language Prior Problem in Visual Question AnsweringAnnual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR), 2019
Yangyang Guo
Zhiyong Cheng
Liqiang Nie
Zichen Liu
Yinglong Wang
Mohan Kankanhalli
178
37
0
13 May 2019
Language-Conditioned Graph Networks for Relational Reasoning
Language-Conditioned Graph Networks for Relational ReasoningIEEE International Conference on Computer Vision (ICCV), 2019
Ronghang Hu
Anna Rohrbach
Trevor Darrell
Kate Saenko
182
182
0
10 May 2019
TVQA+: Spatio-Temporal Grounding for Video Question Answering
TVQA+: Spatio-Temporal Grounding for Video Question Answering
Jie Lei
Licheng Yu
Tamara L. Berg
Joey Tianyi Zhou
202
252
0
25 Apr 2019
Challenges and Prospects in Vision and Language Research
Challenges and Prospects in Vision and Language Research
Kushal Kafle
Robik Shrestha
Christopher Kanan
191
42
0
19 Apr 2019
Integrating Text and Image: Determining Multimodal Document Intent in
  Instagram Posts
Integrating Text and Image: Determining Multimodal Document Intent in Instagram Posts
Julia Kruk
Jonah Lubin
Karan Sikka
Xiaoyu Lin
Dan Jurafsky
Ajay Divakaran
250
106
0
19 Apr 2019
Towards VQA Models That Can Read
Towards VQA Models That Can Read
Amanpreet Singh
Vivek Natarajan
Meet Shah
Yu Jiang
Xinlei Chen
Dhruv Batra
Devi Parikh
Marcus Rohrbach
EgoV
573
1,675
0
18 Apr 2019
Learning to Collocate Neural Modules for Image Captioning
Learning to Collocate Neural Modules for Image Captioning
Xu Yang
Hanwang Zhang
Jianfei Cai
123
84
0
18 Apr 2019
A Simple Baseline for Audio-Visual Scene-Aware Dialog
A Simple Baseline for Audio-Visual Scene-Aware Dialog
Idan Schwartz
Alex Schwing
Tamir Hazan
186
79
0
11 Apr 2019
Reasoning Visual Dialogs with Structural and Partial Observations
Reasoning Visual Dialogs with Structural and Partial Observations
Zilong Zheng
Wenguan Wang
Siyuan Qi
Song-Chun Zhu
216
118
0
11 Apr 2019
Quizbowl: The Case for Incremental Question Answering
Quizbowl: The Case for Incremental Question Answering
Pedro Rodriguez
Shi Feng
Mohit Iyyer
He He
Jordan L. Boyd-Graber
230
54
0
09 Apr 2019
Revisiting EmbodiedQA: A Simple Baseline and Beyond
Revisiting EmbodiedQA: A Simple Baseline and Beyond
Yuehua Wu
Lu Jiang
Yi Yang
LM&Ro
175
33
0
08 Apr 2019
Actively Seeking and Learning from Live Data
Actively Seeking and Learning from Live Data
Damien Teney
Anton Van Den Hengel
OOD
124
22
0
05 Apr 2019
VQD: Visual Query Detection in Natural Scenes
VQD: Visual Query Detection in Natural Scenes
Manoj Acharya
Karan Jariwala
Christopher Kanan
ObjD
180
18
0
04 Apr 2019
Context and Attribute Grounded Dense Captioning
Context and Attribute Grounded Dense Captioning
Guojun Yin
Lu Sheng
Bin Liu
Nenghai Yu
Xiaogang Wang
Jing Shao
131
83
0
02 Apr 2019
Relation-Aware Graph Attention Network for Visual Question Answering
Relation-Aware Graph Attention Network for Visual Question Answering
Linjie Li
Zhe Gan
Yu Cheng
Jingjing Liu
GNN
423
379
0
29 Mar 2019
Visual Query Answering by Entity-Attribute Graph Matching and Reasoning
Visual Query Answering by Entity-Attribute Graph Matching and Reasoning
Peixi Xiong
Huayi Zhan
Xin Eric Wang
Baivab Sinha
Ying Nian Wu
139
17
0
16 Mar 2019
Answer Them All! Toward Universal Visual Question Answering Models
Answer Them All! Toward Universal Visual Question Answering ModelsComputer Vision and Pattern Recognition (CVPR), 2019
Robik Shrestha
Kushal Kafle
Christopher Kanan
282
86
0
01 Mar 2019
GQA: A New Dataset for Real-World Visual Reasoning and Compositional
  Question Answering
GQA: A New Dataset for Real-World Visual Reasoning and Compositional Question Answering
Drew A. Hudson
Christopher D. Manning
CoGeNAI
234
147
0
25 Feb 2019
MUREL: Multimodal Relational Reasoning for Visual Question Answering
MUREL: Multimodal Relational Reasoning for Visual Question Answering
Rémi Cadène
H. Ben-younes
Matthieu Cord
Nicolas Thome
LRM
229
295
0
25 Feb 2019
Cycle-Consistency for Robust Visual Question Answering
Cycle-Consistency for Robust Visual Question Answering
Meet Shah
Xinlei Chen
Marcus Rohrbach
Devi Parikh
OOD
172
198
0
15 Feb 2019
Can We Automate Diagrammatic Reasoning?
Can We Automate Diagrammatic Reasoning?Pattern Recognition (Pattern Recognit.), 2019
Sk. Arif Ahmed
D. P. Dogra
S. Kar
P. Roy
D. Prasad
140
4
0
13 Feb 2019
Taking a HINT: Leveraging Explanations to Make Vision and Language
  Models More Grounded
Taking a HINT: Leveraging Explanations to Make Vision and Language Models More GroundedIEEE International Conference on Computer Vision (ICCV), 2019
Ramprasaath R. Selvaraju
Stefan Lee
Yilin Shen
Hongxia Jin
Shalini Ghosh
Larry Heck
Dhruv Batra
Devi Parikh
FAttVLM
251
279
0
11 Feb 2019
EvalAI: Towards Better Evaluation Systems for AI Agents
EvalAI: Towards Better Evaluation Systems for AI Agents
Deshraj Yadav
Rishabh Jain
Harsh Agrawal
Prithvijit Chattopadhyay
Taranjeet Singh
Akash Jain
Shivkaran Singh
Stefan Lee
Dhruv Batra
ELM
158
66
0
10 Feb 2019
Explanation in Human-AI Systems: A Literature Meta-Review, Synopsis of
  Key Ideas and Publications, and Bibliography for Explainable AI
Explanation in Human-AI Systems: A Literature Meta-Review, Synopsis of Key Ideas and Publications, and Bibliography for Explainable AI
Shane T. Mueller
R. Hoffman
W. Clancey
Abigail Emrey
Gary Klein
XAI
213
304
0
05 Feb 2019
VrR-VG: Refocusing Visually-Relevant Relationships
VrR-VG: Refocusing Visually-Relevant Relationships
Yuanzhi Liang
Yalong Bai
Wei Zhang
Xueming Qian
Li Zhu
Tao Mei
3DH
235
8
0
01 Feb 2019
BLOCK: Bilinear Superdiagonal Fusion for Visual Question Answering and
  Visual Relationship Detection
BLOCK: Bilinear Superdiagonal Fusion for Visual Question Answering and Visual Relationship DetectionAAAI Conference on Artificial Intelligence (AAAI), 2019
H. Ben-younes
Rémi Cadène
Nicolas Thome
Matthieu Cord
316
230
0
31 Jan 2019
Visual Entailment: A Novel Task for Fine-Grained Image Understanding
Visual Entailment: A Novel Task for Fine-Grained Image Understanding
Ning Xie
Farley Lai
Derek Doran
Asim Kadav
CoGe
304
346
0
20 Jan 2019
Evaluating Text-to-Image Matching using Binary Image Selection (BISON)
Evaluating Text-to-Image Matching using Binary Image Selection (BISON)
Hexiang Hu
Ishan Misra
Laurens van der Maaten
162
24
0
19 Jan 2019
Response to "Visual Dialogue without Vision or Dialogue" (Massiceti et
  al., 2018)
Response to "Visual Dialogue without Vision or Dialogue" (Massiceti et al., 2018)
Abhishek Das
Devi Parikh
Dhruv Batra
68
2
0
16 Jan 2019
CLEVR-Ref+: Diagnosing Visual Reasoning with Referring Expressions
CLEVR-Ref+: Diagnosing Visual Reasoning with Referring Expressions
Runtao Liu
Chenxi Liu
Yutong Bai
Alan Yuille
NAIObjD
317
141
0
03 Jan 2019
Previous
123...4243444546
Next