Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1505.01121
Cited By
Ask Your Neurons: A Neural-based Approach to Answering Questions about Images
5 May 2015
Mateusz Malinowski
Marcus Rohrbach
Mario Fritz
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Ask Your Neurons: A Neural-based Approach to Answering Questions about Images"
50 / 69 papers shown
Title
LOVA3: Learning to Visual Question Answering, Asking and Assessment
Henry Hengyuan Zhao
Pan Zhou
Difei Gao
Zechen Bai
Mike Zheng Shou
77
8
0
21 Feb 2025
Text-Guided Coarse-to-Fine Fusion Network for Robust Remote Sensing Visual Question Answering
Zhicheng Zhao
Changfu Zhou
Yu Zhang
Chenglong Li
Xiaoliang Ma
Jin Tang
76
0
0
24 Nov 2024
LOIS: Looking Out of Instance Semantics for Visual Question Answering
Siyu Zhang
Ye Chen
Yaoru Sun
Fang Wang
Haibo Shi
Haoran Wang
23
4
0
26 Jul 2023
Towards AGI in Computer Vision: Lessons Learned from GPT and Large Language Models
Lingxi Xie
Longhui Wei
Xiaopeng Zhang
Kaifeng Bi
Xiaotao Gu
Jianlong Chang
Qi Tian
29
7
0
14 Jun 2023
AlignVE: Visual Entailment Recognition Based on Alignment Relations
Biwei Cao
Jiuxin Cao
Jie Gui
Jiayun Shen
Bo Liu
Lei He
Yuan Yan Tang
James T. Kwok
18
7
0
16 Nov 2022
From Pixels to Objects: Cubic Visual Attention for Visual Question Answering
Jingkuan Song
Pengpeng Zeng
Lianli Gao
Heng Tao Shen
19
62
0
04 Jun 2022
Attention Mechanism based Cognition-level Scene Understanding
Xuejiao Tang
Tai Le Quy
LRM
23
0
0
17 Apr 2022
VQA-MHUG: A Gaze Dataset to Study Multimodal Neural Attention in Visual Question Answering
Ekta Sood
Fabian Kögel
Florian Strohm
Prajit Dhar
Andreas Bulling
29
19
0
27 Sep 2021
Longer Version for "Deep Context-Encoding Network for Retinal Image Captioning"
Jia-Hong Huang
Ting-Wei Wu
Chao-Han Huck Yang
M. Worring
MedIm
13
28
0
30 May 2021
VMSMO: Learning to Generate Multimodal Summary for Video-based News Articles
Li Mingzhe
Xiuying Chen
Shen Gao
Zhangming Chan
Dongyan Zhao
Rui Yan
17
82
0
12 Oct 2020
Mucko: Multi-Layer Cross-Modal Knowledge Reasoning for Fact-based Visual Question Answering
Zihao Zhu
J. Yu
Yujing Wang
Yajing Sun
Yue Hu
Qi Wu
17
125
0
16 Jun 2020
Towards Causal VQA: Revealing and Reducing Spurious Correlations by Invariant and Covariant Semantic Editing
Vedika Agarwal
Rakshith Shetty
Mario Fritz
CML
AAML
19
155
0
16 Dec 2019
TAB-VCR: Tags and Attributes based Visual Commonsense Reasoning Baselines
Jingxiang Lin
Unnat Jain
A. Schwing
LRM
ReLM
26
9
0
31 Oct 2019
Compact Trilinear Interaction for Visual Question Answering
Tuong Khanh Long Do
Thanh-Toan Do
Huy Tran
Erman Tjiputra
Quang-Dieu Tran
26
59
0
26 Sep 2019
Aesthetic Image Captioning From Weakly-Labelled Photographs
Koustav Ghosal
A. Rana
A. Smolic
17
25
0
29 Aug 2019
Why Does a Visual Question Have Different Answers?
Nilavra Bhattacharya
Qing Li
Danna Gurari
15
65
0
12 Aug 2019
Adversarial Multimodal Network for Movie Question Answering
Zhaoquan Yuan
Siyuan Sun
Lixin Duan
Xiao Wu
Changsheng Xu
19
3
0
24 Jun 2019
Factor Graph Attention
Idan Schwartz
Seunghak Yu
Tamir Hazan
A. Schwing
19
110
0
11 Apr 2019
A Simple Baseline for Audio-Visual Scene-Aware Dialog
Idan Schwartz
A. Schwing
Tamir Hazan
19
69
0
11 Apr 2019
Constructing Hierarchical Q&A Datasets for Video Story Understanding
Y. Heo
Kyoung-Woon On
Seong-Ho Choi
Jaeseo Lim
Jinah Kim
Jeh-Kwang Ryu
Byung-Chull Bae
Byoung-Tak Zhang
15
4
0
01 Apr 2019
Evaluating Text-to-Image Matching using Binary Image Selection (BISON)
Hexiang Hu
Ishan Misra
L. V. D. van der Maaten
16
22
0
19 Jan 2019
Semi-interactive Attention Network for Answer Understanding in Reverse-QA
Qing Yin
Guan Luo
Xiao-Dan Zhu
Q. Hu
Ou Wu
16
5
0
12 Jan 2019
Textually Enriched Neural Module Networks for Visual Question Answering
Khyathi Raghavi Chandu
Mary Arpita Pyreddy
Matthieu Felix
N. Joshi
24
6
0
23 Sep 2018
Interpretable Visual Question Answering by Reasoning on Dependency Trees
Qingxing Cao
Bailin Li
Xiaodan Liang
Liang Lin
20
55
0
06 Sep 2018
Multimodal Grounding for Language Processing
Lisa Beinborn
Teresa Botschen
Iryna Gurevych
14
32
0
17 Jun 2018
Learning Visual Knowledge Memory Networks for Visual Question Answering
Zhou Su
Chen Zhu
Yinpeng Dong
Dongqi Cai
Yurong Chen
Jianguo Li
27
62
0
13 Jun 2018
Joint Image Captioning and Question Answering
Jialin Wu
Zeyuan Hu
Raymond J. Mooney
22
12
0
22 May 2018
Category-Based Deep CCA for Fine-Grained Venue Discovery from Multimodal Data
Yi Yu
Suhua Tang
Kiyoharu Aizawa
Akiko Aizawa
4
100
0
08 May 2018
Unsupervised Textual Grounding: Linking Words to Image Concepts
Raymond A. Yeh
Minh Do
A. Schwing
14
40
0
29 Mar 2018
Motion-Appearance Co-Memory Networks for Video Question Answering
J. Gao
Runzhou Ge
Kan Chen
Ram Nevatia
13
240
0
29 Mar 2018
A Survey on Deep Learning Methods for Robot Vision
Javier Ruiz-del-Solar
P. Loncomilla
Naiomi Soto
26
60
0
28 Mar 2018
Multimodal Explanations: Justifying Decisions and Pointing to the Evidence
Dong Huk Park
Lisa Anne Hendricks
Zeynep Akata
Anna Rohrbach
Bernt Schiele
Trevor Darrell
Marcus Rohrbach
35
418
0
15 Feb 2018
Tell-and-Answer: Towards Explainable Visual Question Answering using Attributes and Captions
Qing Li
Jianlong Fu
D. Yu
Tao Mei
Jiebo Luo
FAtt
XAI
CoGe
43
60
0
27 Jan 2018
Active Learning for Visual Question Answering: An Empirical Study
Xiaoyu Lin
Devi Parikh
31
31
0
06 Nov 2017
FiLM: Visual Reasoning with a General Conditioning Layer
Ethan Perez
Florian Strub
H. D. Vries
Vincent Dumoulin
Aaron Courville
FAtt
AIMat
OffRL
AI4CE
61
2,142
0
22 Sep 2017
Visual Question Generation as Dual Task of Visual Question Answering
Yikang Li
Nan Duan
Bolei Zhou
Xiao Chu
Wanli Ouyang
Xiaogang Wang
24
165
0
21 Sep 2017
Tips and Tricks for Visual Question Answering: Learnings from the 2017 Challenge
Damien Teney
Peter Anderson
Xiaodong He
A. Hengel
45
380
0
09 Aug 2017
Multi-modal Factorized Bilinear Pooling with Co-Attention Learning for Visual Question Answering
Zhou Yu
Jun-chen Yu
Jianping Fan
Dacheng Tao
16
663
0
04 Aug 2017
Best of Both Worlds: Transferring Knowledge from Discriminative Learning to a Generative Visual Dialog Model
Jiasen Lu
A. Kannan
Jianwei Yang
Devi Parikh
Dhruv Batra
BDL
13
136
0
05 Jun 2017
Multimodal Machine Learning: A Survey and Taxonomy
T. Baltrušaitis
Chaitanya Ahuja
Louis-Philippe Morency
11
2,855
0
26 May 2017
TGIF-QA: Toward Spatio-Temporal Reasoning in Visual Question Answering
Y. Jang
Yale Song
Youngjae Yu
Youngjin Kim
Gunhee Kim
19
545
0
14 Apr 2017
Explaining the Unexplained: A CLass-Enhanced Attentive Response (CLEAR) Approach to Understanding Deep Neural Networks
Devinder Kumar
Alexander Wong
Graham W. Taylor
16
59
0
13 Apr 2017
An Analysis of Visual Question Answering Algorithms
Kushal Kafle
Christopher Kanan
19
230
0
28 Mar 2017
Recurrent Multimodal Interaction for Referring Image Segmentation
Chenxi Liu
Zhe-nan Lin
Xiaohui Shen
Jimei Yang
Xin Lu
Alan Yuille
EgoV
28
234
0
23 Mar 2017
Learning Cooperative Visual Dialog Agents with Deep Reinforcement Learning
Abhishek Das
Satwik Kottur
J. M. F. Moura
Stefan Lee
Dhruv Batra
OffRL
31
423
0
20 Mar 2017
Task-driven Visual Saliency and Attention-based Visual Question Answering
Yuetan Lin
Zhangyang Pang
Donghui Wang
Yueting Zhuang
19
26
0
22 Feb 2017
Person Search with Natural Language Description
Shuang Li
Tong Xiao
Hongsheng Li
Bolei Zhou
Dayu Yue
Xiaogang Wang
14
385
0
19 Feb 2017
The VQA-Machine: Learning How to Use Existing Vision Algorithms to Answer New Questions
Peng Wang
Qi Wu
Chunhua Shen
A. Hengel
OOD
18
86
0
16 Dec 2016
Attentive Explanations: Justifying Decisions and Pointing to the Evidence
Dong Huk Park
Lisa Anne Hendricks
Zeynep Akata
Bernt Schiele
Trevor Darrell
Marcus Rohrbach
AAML
16
79
0
14 Dec 2016
Making the V in VQA Matter: Elevating the Role of Image Understanding in Visual Question Answering
Yash Goyal
Tejas Khot
D. Summers-Stay
Dhruv Batra
Devi Parikh
CoGe
97
3,115
0
02 Dec 2016
1
2
Next