Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1505.00468
Cited By
VQA: Visual Question Answering
3 May 2015
Aishwarya Agrawal
Jiasen Lu
Stanislaw Antol
Margaret Mitchell
C. L. Zitnick
Dhruv Batra
Devi Parikh
CoGe
Re-assign community
ArXiv
PDF
HTML
Papers citing
"VQA: Visual Question Answering"
50 / 792 papers shown
Title
Don't Just Assume; Look and Answer: Overcoming Priors for Visual Question Answering
Aishwarya Agrawal
Dhruv Batra
Devi Parikh
Aniruddha Kembhavi
OOD
51
581
0
01 Dec 2017
HoME: a Household Multimodal Environment
Simon Brodeur
Ethan Perez
Ankesh Anand
Florian Golemo
Luca Herranz-Celotti
Florian Strub
Jean Rouat
Hugo Larochelle
Aaron Courville
LM&Ro
26
103
0
29 Nov 2017
Convolutional Image Captioning
J. Aneja
Aditya Deshpande
A. Schwing
VLM
23
359
0
24 Nov 2017
Mastering the Dungeon: Grounded Language Learning by Mechanical Turker Descent
Zhilin Yang
Saizheng Zhang
Jack Urbanek
Will Feng
Alexander H. Miller
Arthur Szlam
Douwe Kiela
Jason Weston
23
25
0
21 Nov 2017
Adversarial Attacks Beyond the Image Space
Xiaohui Zeng
Chenxi Liu
Yu-Siang Wang
Weichao Qiu
Lingxi Xie
Yu-Wing Tai
Chi-Keung Tang
Alan Yuille
AAML
25
145
0
20 Nov 2017
Crowdsourcing Question-Answer Meaning Representations
Julian Michael
Gabriel Stanovsky
Luheng He
Ido Dagan
Luke Zettlemoyer
19
78
0
16 Nov 2017
Object Referring in Visual Scene with Spoken Language
A. Vasudevan
Dengxin Dai
Luc Van Gool
29
18
0
10 Nov 2017
Active Learning for Visual Question Answering: An Empirical Study
Xiaoyu Lin
Devi Parikh
36
31
0
06 Nov 2017
Survey of Recent Advances in Visual Question Answering
Supriya Pandhre
Shagun Sodhani
8
14
0
24 Sep 2017
FiLM: Visual Reasoning with a General Conditioning Layer
Ethan Perez
Florian Strub
H. D. Vries
Vincent Dumoulin
Aaron Courville
FAtt
AIMat
OffRL
AI4CE
70
2,144
0
22 Sep 2017
Visual Question Generation as Dual Task of Visual Question Answering
Yikang Li
Nan Duan
Bolei Zhou
Xiao Chu
Wanli Ouyang
Xiaogang Wang
29
165
0
21 Sep 2017
Self-Guiding Multimodal LSTM - when we do not have a perfect training dataset for image captioning
Yang Xian
Yingli Tian
VLM
21
22
0
15 Sep 2017
Link the head to the "beak": Zero Shot Learning from Noisy Text Description at Part Precision
Mohamed Elhoseiny
Yizhe Zhu
Han Zhang
Ahmed Elgammal
VLM
30
132
0
04 Sep 2017
VQS: Linking Segmentations to Questions and Answers for Supervised Attention in VQA and Question-Focused Semantic Segmentation
Chuang Gan
Yandong Li
Haoxiang Li
Chen Sun
Boqing Gong
19
126
0
15 Aug 2017
Tips and Tricks for Visual Question Answering: Learnings from the 2017 Challenge
Damien Teney
Peter Anderson
Xiaodong He
A. Hengel
45
380
0
09 Aug 2017
Weakly Supervised Image Annotation and Segmentation with Objects and Attributes
Zhiyuan Shi
Yongxin Yang
Timothy M. Hospedales
Tao Xiang
11
46
0
08 Aug 2017
What is the Role of Recurrent Neural Networks (RNNs) in an Image Caption Generator?
Marc Tanti
Albert Gatt
K. Camilleri
16
56
0
07 Aug 2017
Multi-modal Factorized Bilinear Pooling with Co-Attention Learning for Visual Question Answering
Zhou Yu
Jun-chen Yu
Jianping Fan
Dacheng Tao
41
663
0
04 Aug 2017
Scene Graph Generation from Objects, Phrases and Region Captions
Yikang Li
Wanli Ouyang
Bolei Zhou
Kun Wang
Xiaogang Wang
21
499
0
31 Jul 2017
Tensor Fusion Network for Multimodal Sentiment Analysis
Amir Zadeh
Minghai Chen
Soujanya Poria
Erik Cambria
Louis-Philippe Morency
22
1,198
0
23 Jul 2017
DeepStory: Video Story QA by Deep Embedded Memory Networks
Kyung-Min Kim
Min-Oh Heo
Seongho Choi
Byoung-Tak Zhang
19
174
0
04 Jul 2017
Best of Both Worlds: Transferring Knowledge from Discriminative Learning to a Generative Visual Dialog Model
Jiasen Lu
A. Kannan
Jianwei Yang
Devi Parikh
Dhruv Batra
BDL
15
136
0
05 Jun 2017
Multimodal Machine Learning: A Survey and Taxonomy
T. Baltrušaitis
Chaitanya Ahuja
Louis-Philippe Morency
13
2,856
0
26 May 2017
Bidirectional Beam Search: Forward-Backward Inference in Neural Sequence Models for Fill-in-the-Blank Image Captioning
Q. Sun
Stefan Lee
Dhruv Batra
BDL
25
43
0
24 May 2017
MUTAN: Multimodal Tucker Fusion for Visual Question Answering
H. Ben-younes
Rémi Cadène
Matthieu Cord
Nicolas Thome
44
578
0
18 May 2017
Combating Human Trafficking with Deep Multimodal Models
Edmund Tong
Amir Zadeh
Cara Jones
Louis-Philippe Morency
13
51
0
08 May 2017
Supervised Learning of Universal Sentence Representations from Natural Language Inference Data
Alexis Conneau
Douwe Kiela
Holger Schwenk
Loïc Barrault
Antoine Bordes
AI4TS
SSL
14
2,093
0
05 May 2017
Weakly-supervised Visual Grounding of Phrases with Linguistic Structures
Fanyi Xiao
Leonid Sigal
Yong Jae Lee
19
138
0
03 May 2017
The Forgettable-Watcher Model for Video Question Answering
Hongyang Xue
Zhou Zhao
Deng Cai
16
9
0
03 May 2017
Mapping Instructions and Visual Observations to Actions with Reinforcement Learning
Dipendra Kumar Misra
John Langford
Yoav Artzi
12
247
0
28 Apr 2017
An Analysis of Action Recognition Datasets for Language and Vision Tasks
Spandana Gella
Frank Keller
ObjD
14
11
0
24 Apr 2017
Being Negative but Constructively: Lessons Learnt from Creating Better Visual Question Answering Datasets
Wei-Lun Chao
Hexiang Hu
Fei Sha
22
37
0
24 Apr 2017
TGIF-QA: Toward Spatio-Temporal Reasoning in Visual Question Answering
Y. Jang
Yale Song
Youngjae Yu
Youngjin Kim
Gunhee Kim
19
545
0
14 Apr 2017
Explaining the Unexplained: A CLass-Enhanced Attentive Response (CLEAR) Approach to Understanding Deep Neural Networks
Devinder Kumar
Alexander Wong
Graham W. Taylor
21
59
0
13 Apr 2017
Learning Two-Branch Neural Networks for Image-Text Matching Tasks
Liwei Wang
Yin Li
Jing-ling Huang
Svetlana Lazebnik
VLM
27
494
0
11 Apr 2017
It Takes Two to Tango: Towards Theory of AI's Mind
Arjun Chandrasekaran
Deshraj Yadav
Prithvijit Chattopadhyay
Viraj Prabhu
Devi Parikh
28
53
0
03 Apr 2017
Towards Building Large Scale Multimodal Domain-Aware Conversation Systems
Amrita Saha
Mitesh Khapra
Karthik Sankaranarayanan
21
8
0
01 Apr 2017
Survey of the State of the Art in Natural Language Generation: Core tasks, applications and evaluation
Albert Gatt
E. Krahmer
LM&MA
ELM
18
809
0
29 Mar 2017
An Analysis of Visual Question Answering Algorithms
Kushal Kafle
Christopher Kanan
19
230
0
28 Mar 2017
Recurrent Multimodal Interaction for Referring Image Segmentation
Chenxi Liu
Zhe-nan Lin
Xiaohui Shen
Jimei Yang
Xin Lu
Alan Yuille
EgoV
36
234
0
23 Mar 2017
Learning Cooperative Visual Dialog Agents with Deep Reinforcement Learning
Abhishek Das
Satwik Kottur
J. M. F. Moura
Stefan Lee
Dhruv Batra
OffRL
31
423
0
20 Mar 2017
Asymmetric Tri-training for Unsupervised Domain Adaptation
Kuniaki Saito
Yoshitaka Ushiku
Tatsuya Harada
26
580
0
27 Feb 2017
Visual Translation Embedding Network for Visual Relation Detection
Hanwang Zhang
Zawlin Kyaw
Shih-Fu Chang
Tat-Seng Chua
ViT
140
560
0
27 Feb 2017
Task-driven Visual Saliency and Attention-based Visual Question Answering
Yuetan Lin
Zhangyang Pang
Donghui Wang
Yueting Zhuang
27
26
0
22 Feb 2017
Person Search with Natural Language Description
Shuang Li
Tong Xiao
Hongsheng Li
Bolei Zhou
Dayu Yue
Xiaogang Wang
19
385
0
19 Feb 2017
Learning Visual N-Grams from Web Data
Ang Li
Allan Jabri
Armand Joulin
L. V. D. van der Maaten
VLM
12
136
0
29 Dec 2016
The VQA-Machine: Learning How to Use Existing Vision Algorithms to Answer New Questions
Peng Wang
Qi Wu
Chunhua Shen
A. Hengel
OOD
18
86
0
16 Dec 2016
Attentive Explanations: Justifying Decisions and Pointing to the Evidence
Dong Huk Park
Lisa Anne Hendricks
Zeynep Akata
Bernt Schiele
Trevor Darrell
Marcus Rohrbach
AAML
16
79
0
14 Dec 2016
ImageNet pre-trained models with batch normalization
Marcel Simon
E. Rodner
Joachim Denzler
VLM
SSeg
30
165
0
05 Dec 2016
Who is Mistaken?
Benjamin Eysenbach
Carl Vondrick
Antonio Torralba
19
15
0
04 Dec 2016
Previous
1
2
3
...
14
15
16
Next