ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1505.00468
  4. Cited By
VQA: Visual Question Answering

VQA: Visual Question Answering

3 May 2015
Aishwarya Agrawal
Jiasen Lu
Stanislaw Antol
Margaret Mitchell
C. L. Zitnick
Dhruv Batra
Devi Parikh
    CoGe
ArXivPDFHTML

Papers citing "VQA: Visual Question Answering"

50 / 792 papers shown
Title
Don't Just Assume; Look and Answer: Overcoming Priors for Visual
  Question Answering
Don't Just Assume; Look and Answer: Overcoming Priors for Visual Question Answering
Aishwarya Agrawal
Dhruv Batra
Devi Parikh
Aniruddha Kembhavi
OOD
51
581
0
01 Dec 2017
HoME: a Household Multimodal Environment
HoME: a Household Multimodal Environment
Simon Brodeur
Ethan Perez
Ankesh Anand
Florian Golemo
Luca Herranz-Celotti
Florian Strub
Jean Rouat
Hugo Larochelle
Aaron Courville
LM&Ro
26
103
0
29 Nov 2017
Convolutional Image Captioning
Convolutional Image Captioning
J. Aneja
Aditya Deshpande
A. Schwing
VLM
23
359
0
24 Nov 2017
Mastering the Dungeon: Grounded Language Learning by Mechanical Turker
  Descent
Mastering the Dungeon: Grounded Language Learning by Mechanical Turker Descent
Zhilin Yang
Saizheng Zhang
Jack Urbanek
Will Feng
Alexander H. Miller
Arthur Szlam
Douwe Kiela
Jason Weston
23
25
0
21 Nov 2017
Adversarial Attacks Beyond the Image Space
Adversarial Attacks Beyond the Image Space
Xiaohui Zeng
Chenxi Liu
Yu-Siang Wang
Weichao Qiu
Lingxi Xie
Yu-Wing Tai
Chi-Keung Tang
Alan Yuille
AAML
25
145
0
20 Nov 2017
Crowdsourcing Question-Answer Meaning Representations
Crowdsourcing Question-Answer Meaning Representations
Julian Michael
Gabriel Stanovsky
Luheng He
Ido Dagan
Luke Zettlemoyer
19
78
0
16 Nov 2017
Object Referring in Visual Scene with Spoken Language
Object Referring in Visual Scene with Spoken Language
A. Vasudevan
Dengxin Dai
Luc Van Gool
29
18
0
10 Nov 2017
Active Learning for Visual Question Answering: An Empirical Study
Active Learning for Visual Question Answering: An Empirical Study
Xiaoyu Lin
Devi Parikh
36
31
0
06 Nov 2017
Survey of Recent Advances in Visual Question Answering
Survey of Recent Advances in Visual Question Answering
Supriya Pandhre
Shagun Sodhani
8
14
0
24 Sep 2017
FiLM: Visual Reasoning with a General Conditioning Layer
FiLM: Visual Reasoning with a General Conditioning Layer
Ethan Perez
Florian Strub
H. D. Vries
Vincent Dumoulin
Aaron Courville
FAtt
AIMat
OffRL
AI4CE
70
2,144
0
22 Sep 2017
Visual Question Generation as Dual Task of Visual Question Answering
Visual Question Generation as Dual Task of Visual Question Answering
Yikang Li
Nan Duan
Bolei Zhou
Xiao Chu
Wanli Ouyang
Xiaogang Wang
29
165
0
21 Sep 2017
Self-Guiding Multimodal LSTM - when we do not have a perfect training
  dataset for image captioning
Self-Guiding Multimodal LSTM - when we do not have a perfect training dataset for image captioning
Yang Xian
Yingli Tian
VLM
21
22
0
15 Sep 2017
Link the head to the "beak": Zero Shot Learning from Noisy Text
  Description at Part Precision
Link the head to the "beak": Zero Shot Learning from Noisy Text Description at Part Precision
Mohamed Elhoseiny
Yizhe Zhu
Han Zhang
Ahmed Elgammal
VLM
30
132
0
04 Sep 2017
VQS: Linking Segmentations to Questions and Answers for Supervised
  Attention in VQA and Question-Focused Semantic Segmentation
VQS: Linking Segmentations to Questions and Answers for Supervised Attention in VQA and Question-Focused Semantic Segmentation
Chuang Gan
Yandong Li
Haoxiang Li
Chen Sun
Boqing Gong
19
126
0
15 Aug 2017
Tips and Tricks for Visual Question Answering: Learnings from the 2017
  Challenge
Tips and Tricks for Visual Question Answering: Learnings from the 2017 Challenge
Damien Teney
Peter Anderson
Xiaodong He
A. Hengel
45
380
0
09 Aug 2017
Weakly Supervised Image Annotation and Segmentation with Objects and
  Attributes
Weakly Supervised Image Annotation and Segmentation with Objects and Attributes
Zhiyuan Shi
Yongxin Yang
Timothy M. Hospedales
Tao Xiang
11
46
0
08 Aug 2017
What is the Role of Recurrent Neural Networks (RNNs) in an Image Caption
  Generator?
What is the Role of Recurrent Neural Networks (RNNs) in an Image Caption Generator?
Marc Tanti
Albert Gatt
K. Camilleri
16
56
0
07 Aug 2017
Multi-modal Factorized Bilinear Pooling with Co-Attention Learning for
  Visual Question Answering
Multi-modal Factorized Bilinear Pooling with Co-Attention Learning for Visual Question Answering
Zhou Yu
Jun-chen Yu
Jianping Fan
Dacheng Tao
41
663
0
04 Aug 2017
Scene Graph Generation from Objects, Phrases and Region Captions
Scene Graph Generation from Objects, Phrases and Region Captions
Yikang Li
Wanli Ouyang
Bolei Zhou
Kun Wang
Xiaogang Wang
21
499
0
31 Jul 2017
Tensor Fusion Network for Multimodal Sentiment Analysis
Tensor Fusion Network for Multimodal Sentiment Analysis
Amir Zadeh
Minghai Chen
Soujanya Poria
Erik Cambria
Louis-Philippe Morency
22
1,198
0
23 Jul 2017
DeepStory: Video Story QA by Deep Embedded Memory Networks
DeepStory: Video Story QA by Deep Embedded Memory Networks
Kyung-Min Kim
Min-Oh Heo
Seongho Choi
Byoung-Tak Zhang
19
174
0
04 Jul 2017
Best of Both Worlds: Transferring Knowledge from Discriminative Learning
  to a Generative Visual Dialog Model
Best of Both Worlds: Transferring Knowledge from Discriminative Learning to a Generative Visual Dialog Model
Jiasen Lu
A. Kannan
Jianwei Yang
Devi Parikh
Dhruv Batra
BDL
15
136
0
05 Jun 2017
Multimodal Machine Learning: A Survey and Taxonomy
Multimodal Machine Learning: A Survey and Taxonomy
T. Baltrušaitis
Chaitanya Ahuja
Louis-Philippe Morency
13
2,856
0
26 May 2017
Bidirectional Beam Search: Forward-Backward Inference in Neural Sequence
  Models for Fill-in-the-Blank Image Captioning
Bidirectional Beam Search: Forward-Backward Inference in Neural Sequence Models for Fill-in-the-Blank Image Captioning
Q. Sun
Stefan Lee
Dhruv Batra
BDL
25
43
0
24 May 2017
MUTAN: Multimodal Tucker Fusion for Visual Question Answering
MUTAN: Multimodal Tucker Fusion for Visual Question Answering
H. Ben-younes
Rémi Cadène
Matthieu Cord
Nicolas Thome
44
578
0
18 May 2017
Combating Human Trafficking with Deep Multimodal Models
Combating Human Trafficking with Deep Multimodal Models
Edmund Tong
Amir Zadeh
Cara Jones
Louis-Philippe Morency
13
51
0
08 May 2017
Supervised Learning of Universal Sentence Representations from Natural
  Language Inference Data
Supervised Learning of Universal Sentence Representations from Natural Language Inference Data
Alexis Conneau
Douwe Kiela
Holger Schwenk
Loïc Barrault
Antoine Bordes
AI4TS
SSL
14
2,093
0
05 May 2017
Weakly-supervised Visual Grounding of Phrases with Linguistic Structures
Weakly-supervised Visual Grounding of Phrases with Linguistic Structures
Fanyi Xiao
Leonid Sigal
Yong Jae Lee
19
138
0
03 May 2017
The Forgettable-Watcher Model for Video Question Answering
The Forgettable-Watcher Model for Video Question Answering
Hongyang Xue
Zhou Zhao
Deng Cai
16
9
0
03 May 2017
Mapping Instructions and Visual Observations to Actions with
  Reinforcement Learning
Mapping Instructions and Visual Observations to Actions with Reinforcement Learning
Dipendra Kumar Misra
John Langford
Yoav Artzi
12
247
0
28 Apr 2017
An Analysis of Action Recognition Datasets for Language and Vision Tasks
An Analysis of Action Recognition Datasets for Language and Vision Tasks
Spandana Gella
Frank Keller
ObjD
14
11
0
24 Apr 2017
Being Negative but Constructively: Lessons Learnt from Creating Better
  Visual Question Answering Datasets
Being Negative but Constructively: Lessons Learnt from Creating Better Visual Question Answering Datasets
Wei-Lun Chao
Hexiang Hu
Fei Sha
22
37
0
24 Apr 2017
TGIF-QA: Toward Spatio-Temporal Reasoning in Visual Question Answering
TGIF-QA: Toward Spatio-Temporal Reasoning in Visual Question Answering
Y. Jang
Yale Song
Youngjae Yu
Youngjin Kim
Gunhee Kim
19
545
0
14 Apr 2017
Explaining the Unexplained: A CLass-Enhanced Attentive Response (CLEAR)
  Approach to Understanding Deep Neural Networks
Explaining the Unexplained: A CLass-Enhanced Attentive Response (CLEAR) Approach to Understanding Deep Neural Networks
Devinder Kumar
Alexander Wong
Graham W. Taylor
21
59
0
13 Apr 2017
Learning Two-Branch Neural Networks for Image-Text Matching Tasks
Learning Two-Branch Neural Networks for Image-Text Matching Tasks
Liwei Wang
Yin Li
Jing-ling Huang
Svetlana Lazebnik
VLM
27
494
0
11 Apr 2017
It Takes Two to Tango: Towards Theory of AI's Mind
It Takes Two to Tango: Towards Theory of AI's Mind
Arjun Chandrasekaran
Deshraj Yadav
Prithvijit Chattopadhyay
Viraj Prabhu
Devi Parikh
28
53
0
03 Apr 2017
Towards Building Large Scale Multimodal Domain-Aware Conversation
  Systems
Towards Building Large Scale Multimodal Domain-Aware Conversation Systems
Amrita Saha
Mitesh Khapra
Karthik Sankaranarayanan
21
8
0
01 Apr 2017
Survey of the State of the Art in Natural Language Generation: Core
  tasks, applications and evaluation
Survey of the State of the Art in Natural Language Generation: Core tasks, applications and evaluation
Albert Gatt
E. Krahmer
LM&MA
ELM
18
809
0
29 Mar 2017
An Analysis of Visual Question Answering Algorithms
An Analysis of Visual Question Answering Algorithms
Kushal Kafle
Christopher Kanan
19
230
0
28 Mar 2017
Recurrent Multimodal Interaction for Referring Image Segmentation
Recurrent Multimodal Interaction for Referring Image Segmentation
Chenxi Liu
Zhe-nan Lin
Xiaohui Shen
Jimei Yang
Xin Lu
Alan Yuille
EgoV
36
234
0
23 Mar 2017
Learning Cooperative Visual Dialog Agents with Deep Reinforcement
  Learning
Learning Cooperative Visual Dialog Agents with Deep Reinforcement Learning
Abhishek Das
Satwik Kottur
J. M. F. Moura
Stefan Lee
Dhruv Batra
OffRL
31
423
0
20 Mar 2017
Asymmetric Tri-training for Unsupervised Domain Adaptation
Asymmetric Tri-training for Unsupervised Domain Adaptation
Kuniaki Saito
Yoshitaka Ushiku
Tatsuya Harada
26
580
0
27 Feb 2017
Visual Translation Embedding Network for Visual Relation Detection
Visual Translation Embedding Network for Visual Relation Detection
Hanwang Zhang
Zawlin Kyaw
Shih-Fu Chang
Tat-Seng Chua
ViT
140
560
0
27 Feb 2017
Task-driven Visual Saliency and Attention-based Visual Question
  Answering
Task-driven Visual Saliency and Attention-based Visual Question Answering
Yuetan Lin
Zhangyang Pang
Donghui Wang
Yueting Zhuang
27
26
0
22 Feb 2017
Person Search with Natural Language Description
Person Search with Natural Language Description
Shuang Li
Tong Xiao
Hongsheng Li
Bolei Zhou
Dayu Yue
Xiaogang Wang
19
385
0
19 Feb 2017
Learning Visual N-Grams from Web Data
Learning Visual N-Grams from Web Data
Ang Li
Allan Jabri
Armand Joulin
L. V. D. van der Maaten
VLM
12
136
0
29 Dec 2016
The VQA-Machine: Learning How to Use Existing Vision Algorithms to
  Answer New Questions
The VQA-Machine: Learning How to Use Existing Vision Algorithms to Answer New Questions
Peng Wang
Qi Wu
Chunhua Shen
A. Hengel
OOD
18
86
0
16 Dec 2016
Attentive Explanations: Justifying Decisions and Pointing to the
  Evidence
Attentive Explanations: Justifying Decisions and Pointing to the Evidence
Dong Huk Park
Lisa Anne Hendricks
Zeynep Akata
Bernt Schiele
Trevor Darrell
Marcus Rohrbach
AAML
16
79
0
14 Dec 2016
ImageNet pre-trained models with batch normalization
ImageNet pre-trained models with batch normalization
Marcel Simon
E. Rodner
Joachim Denzler
VLM
SSeg
30
165
0
05 Dec 2016
Who is Mistaken?
Who is Mistaken?
Benjamin Eysenbach
Carl Vondrick
Antonio Torralba
19
15
0
04 Dec 2016
Previous
123...141516
Next