ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1612.00837
  4. Cited By
Making the V in VQA Matter: Elevating the Role of Image Understanding in
  Visual Question Answering
v1v2v3 (latest)

Making the V in VQA Matter: Elevating the Role of Image Understanding in Visual Question Answering

2 December 2016
Yash Goyal
Tejas Khot
D. Summers-Stay
Dhruv Batra
Devi Parikh
    CoGe
ArXiv (abs)PDFHTML

Papers citing "Making the V in VQA Matter: Elevating the Role of Image Understanding in Visual Question Answering"

50 / 2,273 papers shown
Title
The meaning of "most" for visual question answering models
The meaning of "most" for visual question answering models
A. Kuhnle
Ann A. Copestake
130
4
0
31 Dec 2018
Scene Graph Reasoning with Prior Visual Relationship for Visual Question
  Answering
Scene Graph Reasoning with Prior Visual Relationship for Visual Question Answering
Zhuoqian Yang
Zengchang Qin
Jing Yu
Yue Hu
GNN
127
16
0
23 Dec 2018
From FiLM to Video: Multi-turn Question Answering with Multi-modal
  Context
From FiLM to Video: Multi-turn Question Answering with Multi-modal Context
T. Nguyen
Shikhar Sharma
Hannes Schulz
Layla El Asri
122
34
0
17 Dec 2018
Visual Social Relationship Recognition
Visual Social Relationship Recognition
Junnan Li
Yongkang Wong
Qi Zhao
Mohan Kankanhalli
111
28
0
13 Dec 2018
Dynamic Fusion with Intra- and Inter- Modality Attention Flow for Visual
  Question Answering
Dynamic Fusion with Intra- and Inter- Modality Attention Flow for Visual Question Answering
Shiyang Feng
Zhengkai Jiang
Haoxuan You
Pan Lu
Steven C. H. Hoi
Xiaogang Wang
Jiaming Song
AIMat
428
393
0
13 Dec 2018
Learning Representations of Sets through Optimized Permutations
Learning Representations of Sets through Optimized Permutations
Yan Zhang
Jonathon S. Hare
Adam Prugel-Bennett
SSL
159
28
0
10 Dec 2018
Learning to Compose Dynamic Tree Structures for Visual Contexts
Learning to Compose Dynamic Tree Structures for Visual Contexts
Kaihua Tang
Hanwang Zhang
Baoyuan Wu
Tong Lu
Wen Liu
249
550
0
05 Dec 2018
Explainable and Explicit Visual Reasoning over Scene Graphs
Explainable and Explicit Visual Reasoning over Scene Graphs
Jiaxin Shi
Hanwang Zhang
Juan-Zi Li
OCL
418
250
0
05 Dec 2018
Learning to Explain with Complemental Examples
Learning to Explain with Complemental Examples
Atsushi Kanehira
Tatsuya Harada
187
43
0
04 Dec 2018
Multimodal Explanations by Predicting Counterfactuality in Videos
Multimodal Explanations by Predicting Counterfactuality in Videos
Atsushi Kanehira
Kentaro Takemoto
S. Inayoshi
Tatsuya Harada
123
41
0
04 Dec 2018
Multi-task Learning of Hierarchical Vision-Language Representation
Multi-task Learning of Hierarchical Vision-Language Representation
Duy-Kien Nguyen
Takayuki Okatani
228
56
0
03 Dec 2018
Learning to Caption Images through a Lifetime by Asking Questions
Learning to Caption Images through a Lifetime by Asking Questions
Tingke Shen
Amlan Kar
Sanja Fidler
222
31
0
01 Dec 2018
From Known to the Unknown: Transferring Knowledge to Answer Questions
  about Novel Visual and Semantic Concepts
From Known to the Unknown: Transferring Knowledge to Answer Questions about Novel Visual and Semantic Concepts
M. Farazi
Salman H Khan
Nick Barnes
132
13
0
30 Nov 2018
Touchdown: Natural Language Navigation and Spatial Reasoning in Visual
  Street Environments
Touchdown: Natural Language Navigation and Spatial Reasoning in Visual Street Environments
Howard Chen
Alane Suhr
Dipendra Kumar Misra
Noah Snavely
Yoav Artzi
481
435
0
29 Nov 2018
From Recognition to Cognition: Visual Commonsense Reasoning
From Recognition to Cognition: Visual Commonsense Reasoning
Rowan Zellers
Yonatan Bisk
Ali Farhadi
Yejin Choi
LRMBDLOCLReLM
588
984
0
27 Nov 2018
Visual Entailment Task for Visually-Grounded Language Learning
Visual Entailment Task for Visually-Grounded Language Learning
Ning Xie
Farley Lai
Derek Doran
Asim Kadav
121
59
0
26 Nov 2018
VQA with no questions-answers training
VQA with no questions-answers trainingComputer Vision and Pattern Recognition (CVPR), 2018
B. Vatashsky
S. Ullman
208
13
0
20 Nov 2018
Explicit Bias Discovery in Visual Question Answering Models
Explicit Bias Discovery in Visual Question Answering ModelsComputer Vision and Pattern Recognition (CVPR), 2018
Varun Manjunatha
Nirat Saini
L. Davis
CMLFAtt
180
96
0
19 Nov 2018
On transfer learning using a MAC model variant
On transfer learning using a MAC model variant
Vincent Marois
T. S. Jayram
V. Albouy
Tomasz Kornuta
Younes Bouhadjar
A. Ozcan
DRL
194
9
0
15 Nov 2018
Holistic Multi-modal Memory Network for Movie Question Answering
Holistic Multi-modal Memory Network for Movie Question Answering
Anran Wang
Anh Tuan Luu
Chuan-Sheng Foo
Erik Cambria
Yi Tay
V. Chandrasekhar
171
20
0
12 Nov 2018
Shifting the Baseline: Single Modality Performance on Visual Navigation
  & QA
Shifting the Baseline: Single Modality Performance on Visual Navigation & QA
Jesse Thomason
Daniel Gordon
Yonatan Bisk
254
80
0
01 Nov 2018
A Corpus for Reasoning About Natural Language Grounded in Photographs
A Corpus for Reasoning About Natural Language Grounded in Photographs
Alane Suhr
Stephanie Zhou
Ally Zhang
Iris Zhang
Huajun Bai
Yoav Artzi
LRM
417
670
0
01 Nov 2018
TallyQA: Answering Complex Counting Questions
TallyQA: Answering Complex Counting Questions
Manoj Acharya
Kushal Kafle
Christopher Kanan
220
162
0
29 Oct 2018
Do Explanations make VQA Models more Predictable to a Human?
Do Explanations make VQA Models more Predictable to a Human?
Arjun Chandrasekaran
Viraj Prabhu
Deshraj Yadav
Prithvijit Chattopadhyay
Devi Parikh
FAtt
226
102
0
29 Oct 2018
Understand, Compose and Respond - Answering Visual Questions by a
  Composition of Abstract Procedures
Understand, Compose and Respond - Answering Visual Questions by a Composition of Abstract Procedures
B. Vatashsky
S. Ullman
CoGe
126
2
0
25 Oct 2018
Knowing Where to Look? Analysis on Attention of Visual Question
  Answering System
Knowing Where to Look? Analysis on Attention of Visual Question Answering System
Wei Li
Zehuan Yuan
Xiangzhong Fang
Changhu Wang
94
8
0
09 Oct 2018
Overcoming Language Priors in Visual Question Answering with Adversarial
  Regularization
Overcoming Language Priors in Visual Question Answering with Adversarial Regularization
S. Ramakrishnan
Aishwarya Agrawal
Stefan Lee
AAML
221
259
0
08 Oct 2018
Neural-Symbolic VQA: Disentangling Reasoning from Vision and Language
  Understanding
Neural-Symbolic VQA: Disentangling Reasoning from Vision and Language Understanding
Kexin Yi
Jiajun Wu
Chuang Gan
Antonio Torralba
Pushmeet Kohli
J. Tenenbaum
NAI
286
654
0
04 Oct 2018
Transfer Learning via Unsupervised Task Discovery for Visual Question
  Answering
Transfer Learning via Unsupervised Task Discovery for Visual Question Answering
Hyeonwoo Noh
Taehoon Kim
Jonghwan Mun
Bohyung Han
192
17
0
03 Oct 2018
The Wisdom of MaSSeS: Majority, Subjectivity, and Semantic Similarity in
  the Evaluation of VQA
The Wisdom of MaSSeS: Majority, Subjectivity, and Semantic Similarity in the Evaluation of VQA
Shailza Jolly
Sandro Pezzelle
T. Klein
Andreas Dengel
Moin Nabi
88
2
0
12 Sep 2018
How clever is the FiLM model, and how clever can it be?
How clever is the FiLM model, and how clever can it be?
A. Kuhnle
Huiyuan Xie
Ann A. Copestake
151
7
0
09 Sep 2018
What If We Simply Swap the Two Text Fragments? A Straightforward yet
  Effective Way to Test the Robustness of Methods to Confounding Signals in
  Nature Language Inference Tasks
What If We Simply Swap the Two Text Fragments? A Straightforward yet Effective Way to Test the Robustness of Methods to Confounding Signals in Nature Language Inference Tasks
Haohan Wang
Da-You Sun
Eric Xing
210
42
0
07 Sep 2018
Visual Coreference Resolution in Visual Dialog using Neural Module
  Networks
Visual Coreference Resolution in Visual Dialog using Neural Module Networks
Satwik Kottur
José M. F. Moura
Devi Parikh
Dhruv Batra
Marcus Rohrbach
186
168
0
06 Sep 2018
Interpretable Visual Question Answering by Reasoning on Dependency Trees
Interpretable Visual Question Answering by Reasoning on Dependency Trees
Qingxing Cao
Bailin Li
Xiaodan Liang
Liang Lin
176
56
0
06 Sep 2018
Straight to the Facts: Learning Knowledge Base Retrieval for Factual
  Visual Question Answering
Straight to the Facts: Learning Knowledge Base Retrieval for Factual Visual Question Answering
Medhini Narasimhan
Alex Schwing
175
111
0
04 Sep 2018
RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking
  Recipes
RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes
Semih Yagcioglu
Aykut Erdem
Erkut Erdem
Nazli Ikizler-Cinbis
CoGe
157
184
0
04 Sep 2018
The Gap of Semantic Parsing: A Survey on Automatic Math Word Problem
  Solvers
The Gap of Semantic Parsing: A Survey on Automatic Math Word Problem Solvers
Dongxiang Zhang
Lei Wang
Nuo Xu
B. Dai
Heng Tao Shen
ReLMAIMat
168
140
0
22 Aug 2018
SWAG: A Large-Scale Adversarial Dataset for Grounded Commonsense
  Inference
SWAG: A Large-Scale Adversarial Dataset for Grounded Commonsense Inference
Rowan Zellers
Yonatan Bisk
Roy Schwartz
Yejin Choi
386
757
0
16 Aug 2018
How Much Reading Does Reading Comprehension Require? A Critical
  Investigation of Popular Benchmarks
How Much Reading Does Reading Comprehension Require? A Critical Investigation of Popular Benchmarks
Divyansh Kaushik
Zachary Chase Lipton
ELM
240
237
0
14 Aug 2018
Community Regularization of Visually-Grounded Dialog
Community Regularization of Visually-Grounded Dialog
Akshat Agarwal
Swaminathan Gurumurthy
Vasu Sharma
M. Lewis
Katia Sycara
133
10
0
10 Aug 2018
A Joint Sequence Fusion Model for Video Question Answering and Retrieval
A Joint Sequence Fusion Model for Video Question Answering and Retrieval
Youngjae Yu
Jongseok Kim
Gunhee Kim
228
381
0
07 Aug 2018
Learning Visual Question Answering by Bootstrapping Hard Attention
Learning Visual Question Answering by Bootstrapping Hard Attention
Mateusz Malinowski
Carl Doersch
Adam Santoro
Peter W. Battaglia
OOD
262
98
0
01 Aug 2018
Interpretable Visual Question Answering by Visual Grounding from
  Attention Supervision Mining
Interpretable Visual Question Answering by Visual Grounding from Attention Supervision Mining
Yundong Zhang
Juan Carlos Niebles
Á. Soto
181
70
0
01 Aug 2018
Pythia v0.1: the Winning Entry to the VQA Challenge 2018
Pythia v0.1: the Winning Entry to the VQA Challenge 2018
Yu Jiang
Vivek Natarajan
Xinlei Chen
Marcus Rohrbach
Dhruv Batra
Devi Parikh
VLM
297
207
0
26 Jul 2018
Explainable Neural Computation via Stack Neural Module Networks
Explainable Neural Computation via Stack Neural Module Networks
Ronghang Hu
Jacob Andreas
Trevor Darrell
Kate Saenko
LRMOCL
315
204
0
23 Jul 2018
Question Relevance in Visual Question Answering
Question Relevance in Visual Question Answering
Prakruthi Prabhakar
Nitish Kulkarni
Linghao Zhang
90
7
0
23 Jul 2018
Dynamic Multimodal Instance Segmentation guided by natural language
  queries
Dynamic Multimodal Instance Segmentation guided by natural language queriesEuropean Conference on Computer Vision (ECCV), 2018
Edgar Margffoy-Tuay
Juan C. Pérez
Emilio Botero
Pablo Arbelaez
256
187
0
06 Jul 2018
Collaborative Annotation of Semantic Objects in Images with
  Multi-granularity Supervisions
Collaborative Annotation of Semantic Objects in Images with Multi-granularity SupervisionsACM Multimedia (ACM MM), 2018
Lishi Zhang
Chenghan Fu
Jia Li
110
8
0
27 Jun 2018
End-to-End Audio Visual Scene-Aware Dialog using Multimodal
  Attention-Based Video Features
End-to-End Audio Visual Scene-Aware Dialog using Multimodal Attention-Based Video Features
Chiori Hori
Huda AlAmri
Jue Wang
Gordon Wichern
Takaaki Hori
...
Raphael Gontijo-Lopes
Abhishek Das
Irfan Essa
Dhruv Batra
Devi Parikh
VGen
199
130
0
21 Jun 2018
Learning Conditioned Graph Structures for Interpretable Visual Question
  Answering
Learning Conditioned Graph Structures for Interpretable Visual Question Answering
Will Norcliffe-Brown
Efstathios Vafeias
Sarah Parisot
GNN
257
250
0
19 Jun 2018
Previous
123...43444546
Next