ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1511.02274
  4. Cited By
Stacked Attention Networks for Image Question Answering

Stacked Attention Networks for Image Question Answering

7 November 2015
Zichao Yang
Xiaodong He
Jianfeng Gao
Li Deng
Alex Smola
    BDL
ArXivPDFHTML

Papers citing "Stacked Attention Networks for Image Question Answering"

50 / 217 papers shown
Title
VQA-MHUG: A Gaze Dataset to Study Multimodal Neural Attention in Visual
  Question Answering
VQA-MHUG: A Gaze Dataset to Study Multimodal Neural Attention in Visual Question Answering
Ekta Sood
Fabian Kögel
Florian Strohm
Prajit Dhar
Andreas Bulling
31
19
0
27 Sep 2021
How to find a good image-text embedding for remote sensing visual
  question answering?
How to find a good image-text embedding for remote sensing visual question answering?
Christel Chappuis
Sylvain Lobry
B. Kellenberger
Bertrand Le Saux
D. Tuia
34
20
0
24 Sep 2021
Improving Joint Learning of Chest X-Ray and Radiology Report by Word
  Region Alignment
Improving Joint Learning of Chest X-Ray and Radiology Report by Word Region Alignment
Zhanghexuan Ji
Mohammad Abuzar Shaikh
Dana Moukheiber
S. Srihari
Yifan Peng
Mingchen Gao
SSL
14
20
0
04 Sep 2021
Understanding the computational demands underlying visual reasoning
Understanding the computational demands underlying visual reasoning
Mohit Vaishnav
Rémi Cadène
A. Alamia
Drew Linsley
Rufin VanRullen
Thomas Serre
GNN
CoGe
32
16
0
08 Aug 2021
How Much Can CLIP Benefit Vision-and-Language Tasks?
How Much Can CLIP Benefit Vision-and-Language Tasks?
Sheng Shen
Liunian Harold Li
Hao Tan
Mohit Bansal
Anna Rohrbach
Kai-Wei Chang
Z. Yao
Kurt Keutzer
CLIP
VLM
MLLM
196
405
0
13 Jul 2021
Zero-shot Visual Question Answering using Knowledge Graph
Zero-shot Visual Question Answering using Knowledge Graph
Zhuo Chen
Jiaoyan Chen
Yuxia Geng
Jeff Z. Pan
Zonggang Yuan
Huajun Chen
15
70
0
12 Jul 2021
Adventurer's Treasure Hunt: A Transparent System for Visually Grounded
  Compositional Visual Question Answering based on Scene Graphs
Adventurer's Treasure Hunt: A Transparent System for Visually Grounded Compositional Visual Question Answering based on Scene Graphs
Daniel Reich
F. Putze
Tanja Schultz
22
2
0
28 Jun 2021
Attend What You Need: Motion-Appearance Synergistic Networks for Video
  Question Answering
Attend What You Need: Motion-Appearance Synergistic Networks for Video Question Answering
Ahjeong Seo
Gi-Cheon Kang
J. Park
Byoung-Tak Zhang
13
53
0
19 Jun 2021
Measuring and Improving BERT's Mathematical Abilities by Predicting the
  Order of Reasoning
Measuring and Improving BERT's Mathematical Abilities by Predicting the Order of Reasoning
Piotr Pikekos
Henryk Michalewski
Mateusz Malinowski
22
28
0
07 Jun 2021
Multiple Meta-model Quantifying for Medical Visual Question Answering
Multiple Meta-model Quantifying for Medical Visual Question Answering
Tuong Khanh Long Do
Binh X. Nguyen
Erman Tjiputra
Minh-Ngoc Tran
Quang-Dieu Tran
A. Nguyen
31
98
0
19 May 2021
InfographicVQA
InfographicVQA
Minesh Mathew
Viraj Bagal
Rubèn Pérez Tito
Dimosthenis Karatzas
Ernest Valveny
C. V. Jawahar
22
202
0
26 Apr 2021
AttWalk: Attentive Cross-Walks for Deep Mesh Analysis
AttWalk: Attentive Cross-Walks for Deep Mesh Analysis
Ran Ben Izhak
Alon Lahav
A. Tal
3DV
29
10
0
23 Apr 2021
Visual Navigation with Spatial Attention
Visual Navigation with Spatial Attention
Bar Mayo
Tamir Hazan
A. Tal
EgoV
19
72
0
20 Apr 2021
Beyond Question-Based Biases: Assessing Multimodal Shortcut Learning in
  Visual Question Answering
Beyond Question-Based Biases: Assessing Multimodal Shortcut Learning in Visual Question Answering
Corentin Dancette
Rémi Cadène
Damien Teney
Matthieu Cord
CML
28
75
0
07 Apr 2021
Seeing Out of tHe bOx: End-to-End Pre-training for Vision-Language
  Representation Learning
Seeing Out of tHe bOx: End-to-End Pre-training for Vision-Language Representation Learning
Zhicheng Huang
Zhaoyang Zeng
Yupan Huang
Bei Liu
Dongmei Fu
Jianlong Fu
VLM
ViT
34
271
0
07 Apr 2021
Dual Contrastive Loss and Attention for GANs
Dual Contrastive Loss and Attention for GANs
Ning Yu
Guilin Liu
Aysegül Dündar
Andrew Tao
Bryan Catanzaro
Larry S. Davis
Mario Fritz
GAN
24
60
0
31 Mar 2021
Incorporating Convolution Designs into Visual Transformers
Incorporating Convolution Designs into Visual Transformers
Kun Yuan
Shaopeng Guo
Ziwei Liu
Aojun Zhou
F. Yu
Wei Wu
ViT
38
467
0
22 Mar 2021
Local Interpretations for Explainable Natural Language Processing: A
  Survey
Local Interpretations for Explainable Natural Language Processing: A Survey
Siwen Luo
Hamish Ivison
S. Han
Josiah Poon
MILM
33
48
0
20 Mar 2021
Decoupled Spatial Temporal Graphs for Generic Visual Grounding
Decoupled Spatial Temporal Graphs for Generic Visual Grounding
Qi Feng
Yunchao Wei
Mingming Cheng
Yi Yang
24
5
0
18 Mar 2021
Causal Attention for Vision-Language Tasks
Causal Attention for Vision-Language Tasks
Xu Yang
Hanwang Zhang
Guojun Qi
Jianfei Cai
CML
28
148
0
05 Mar 2021
SLAKE: A Semantically-Labeled Knowledge-Enhanced Dataset for Medical
  Visual Question Answering
SLAKE: A Semantically-Labeled Knowledge-Enhanced Dataset for Medical Visual Question Answering
Bo Liu
Li-Ming Zhan
Li Xu
Lin Ma
Y. Yang
Xiao-Ming Wu
22
234
0
18 Feb 2021
Biomedical Question Answering: A Survey of Approaches and Challenges
Biomedical Question Answering: A Survey of Approaches and Challenges
Qiao Jin
Zheng Yuan
Guangzhi Xiong
Qian Yu
Huaiyuan Ying
Chuanqi Tan
Mosha Chen
Songfang Huang
Xiaozhong Liu
Sheng Yu
21
95
0
10 Feb 2021
Answer Questions with Right Image Regions: A Visual Attention
  Regularization Approach
Answer Questions with Right Image Regions: A Visual Attention Regularization Approach
Y. Liu
Yangyang Guo
Jianhua Yin
Xuemeng Song
Weifeng Liu
Liqiang Nie
24
28
0
03 Feb 2021
Latent Variable Models for Visual Question Answering
Latent Variable Models for Visual Question Answering
Zixu Wang
Yishu Miao
Lucia Specia
25
5
0
16 Jan 2021
Explainability of deep vision-based autonomous driving systems: Review
  and challenges
Explainability of deep vision-based autonomous driving systems: Review and challenges
Éloi Zablocki
H. Ben-younes
P. Pérez
Matthieu Cord
XAI
37
169
0
13 Jan 2021
ORDNet: Capturing Omni-Range Dependencies for Scene Parsing
ORDNet: Capturing Omni-Range Dependencies for Scene Parsing
Shaofei Huang
Si Liu
Tianrui Hui
Jizhong Han
Bo-wen Li
Jiashi Feng
Shuicheng Yan
3DPC
OffRL
29
15
0
11 Jan 2021
MELINDA: A Multimodal Dataset for Biomedical Experiment Method
  Classification
MELINDA: A Multimodal Dataset for Biomedical Experiment Method Classification
Te-Lin Wu
Shikhar Singh
S. Paul
Gully A. Burns
Nanyun Peng
22
18
0
16 Dec 2020
WeaQA: Weak Supervision via Captions for Visual Question Answering
WeaQA: Weak Supervision via Captions for Visual Question Answering
Pratyay Banerjee
Tejas Gokhale
Yezhou Yang
Chitta Baral
17
34
0
04 Dec 2020
ATSal: An Attention Based Architecture for Saliency Prediction in 360
  Videos
ATSal: An Attention Based Architecture for Saliency Prediction in 360 Videos
Y. A. D. Djilali
M. Tliba
Kevin McGuinness
Noel E. O'Connor
33
42
0
20 Nov 2020
An Improved Attention for Visual Question Answering
An Improved Attention for Visual Question Answering
Tanzila Rahman
Shih-Han Chou
Leonid Sigal
Giuseppe Carenini
13
42
0
04 Nov 2020
Deep Reinforcement Learning with Stacked Hierarchical Attention for
  Text-based Games
Deep Reinforcement Learning with Stacked Hierarchical Attention for Text-based Games
Yunqiu Xu
Meng Fang
Ling-Hao Chen
Yali Du
Joey Tianyi Zhou
Chengqi Zhang
OffRL
25
44
0
22 Oct 2020
New Ideas and Trends in Deep Multimodal Content Understanding: A Review
New Ideas and Trends in Deep Multimodal Content Understanding: A Review
Wei-Neng Chen
Weiping Wang
Li Liu
M. Lew
VLM
112
31
0
16 Oct 2020
A Comparison of Pre-trained Vision-and-Language Models for Multimodal
  Representation Learning across Medical Images and Reports
A Comparison of Pre-trained Vision-and-Language Models for Multimodal Representation Learning across Medical Images and Reports
Yikuan Li
Hanyin Wang
Yuan Luo
6
63
0
03 Sep 2020
Counting from Sky: A Large-scale Dataset for Remote Sensing Object
  Counting and A Benchmark Method
Counting from Sky: A Large-scale Dataset for Remote Sensing Object Counting and A Benchmark Method
Guangshuai Gao
Qingjie Liu
Yunhong Wang
13
53
0
28 Aug 2020
AiR: Attention with Reasoning Capability
AiR: Attention with Reasoning Capability
Shi Chen
Ming Jiang
Jinhui Yang
Qi Zhao
LRM
13
36
0
28 Jul 2020
Category-Specific CNN for Visual-aware CTR Prediction at JD.com
Category-Specific CNN for Visual-aware CTR Prediction at JD.com
Hu Liu
Jing Lu
Hao Yang
Xiwei Zhao
Sulong Xu
...
Zehua Zhang
Wenjie Niu
Xiaokun Zhu
Yongjun Bao
Weipeng P. Yan
9
31
0
18 Jun 2020
Mucko: Multi-Layer Cross-Modal Knowledge Reasoning for Fact-based Visual
  Question Answering
Mucko: Multi-Layer Cross-Modal Knowledge Reasoning for Fact-based Visual Question Answering
Zihao Zhu
J. Yu
Yujing Wang
Yajing Sun
Yue Hu
Qi Wu
19
125
0
16 Jun 2020
Large-Scale Adversarial Training for Vision-and-Language Representation
  Learning
Large-Scale Adversarial Training for Vision-and-Language Representation Learning
Zhe Gan
Yen-Chun Chen
Linjie Li
Chen Zhu
Yu Cheng
Jingjing Liu
ObjD
VLM
26
487
0
11 Jun 2020
Estimating semantic structure for the VQA answer space
Estimating semantic structure for the VQA answer space
Corentin Kervadec
G. Antipov
M. Baccouche
Christian Wolf
18
4
0
10 Jun 2020
Hyperspectral Image Classification with Attention Aided CNNs
Hyperspectral Image Classification with Attention Aided CNNs
Renlong Hang
Zhu Li
Qingshan Liu
Pedram Ghamisi
Shuvra S. Bhattacharyya
7
225
0
25 May 2020
Modeling Human Visual Search Performance on Realistic Webpages Using
  Analytical and Deep Learning Methods
Modeling Human Visual Search Performance on Realistic Webpages Using Analytical and Deep Learning Methods
Arianna Yuan
Y. Li
HAI
17
24
0
07 May 2020
Exploring Self-attention for Image Recognition
Exploring Self-attention for Image Recognition
Hengshuang Zhao
Jiaya Jia
V. Koltun
SSL
26
772
0
28 Apr 2020
Causal Interpretability for Machine Learning -- Problems, Methods and
  Evaluation
Causal Interpretability for Machine Learning -- Problems, Methods and Evaluation
Raha Moraffah
Mansooreh Karami
Ruocheng Guo
A. Raglin
Huan Liu
CML
ELM
XAI
27
212
0
09 Mar 2020
Adaptive Offline Quintuplet Loss for Image-Text Matching
Adaptive Offline Quintuplet Loss for Image-Text Matching
Tianlang Chen
Jiajun Deng
Jiebo Luo
181
68
0
07 Mar 2020
Dropout: Explicit Forms and Capacity Control
Dropout: Explicit Forms and Capacity Control
R. Arora
Peter L. Bartlett
Poorya Mianjy
Nathan Srebro
55
37
0
06 Mar 2020
RP-DNN: A Tweet level propagation context based deep neural networks for
  early rumor detection in Social Media
RP-DNN: A Tweet level propagation context based deep neural networks for early rumor detection in Social Media
Jie Gao
Sooji Han
Xingyi Song
F. Ciravegna
8
20
0
28 Feb 2020
An Attention Transfer Model for Human-Assisted Failure Avoidance in
  Robot Manipulations
An Attention Transfer Model for Human-Assisted Failure Avoidance in Robot Manipulations
Boyi Song
Yu-Tang Peng
Ruijiao Luo
R. Liu
11
2
0
11 Feb 2020
Weakly Supervised Few-shot Object Segmentation using Co-Attention with
  Visual and Semantic Embeddings
Weakly Supervised Few-shot Object Segmentation using Co-Attention with Visual and Semantic Embeddings
Mennatullah Siam
Naren Doraiswamy
Boris N. Oreshkin
Hengshuai Yao
Martin Jägersand
21
8
0
26 Jan 2020
Accuracy vs. Complexity: A Trade-off in Visual Question Answering Models
Accuracy vs. Complexity: A Trade-off in Visual Question Answering Models
M. Farazi
Salman H. Khan
Nick Barnes
23
17
0
20 Jan 2020
Human-Aware Motion Deblurring
Human-Aware Motion Deblurring
Ziyi Shen
Wenguan Wang
Xiankai Lu
Jianbing Shen
Haibin Ling
Tingfa Xu
Ling Shao
3DH
19
284
0
19 Jan 2020
Previous
12345
Next