Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1502.03044
Cited By
Show, Attend and Tell: Neural Image Caption Generation with Visual Attention
10 February 2015
Ke Xu
Jimmy Ba
Ryan Kiros
Kyunghyun Cho
Aaron Courville
Ruslan Salakhutdinov
R. Zemel
Yoshua Bengio
DiffM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Show, Attend and Tell: Neural Image Caption Generation with Visual Attention"
50 / 3,507 papers shown
Title
CoSy: Evaluating Textual Explanations of Neurons
Laura Kopf
P. Bommer
Anna Hedström
Sebastian Lapuschkin
Marina M.-C. Höhne
Kirill Bykov
44
7
0
30 May 2024
Source Code Foundation Models are Transferable Binary Analysis Knowledge Bases
Zian Su
Xiangzhe Xu
Ziyang Huang
Kaiyuan Zhang
Xiangyu Zhang
32
5
0
30 May 2024
SIG: Efficient Self-Interpretable Graph Neural Network for Continuous-time Dynamic Graphs
Lanting Fang
Yulian Yang
Kai Wang
Shanshan Feng
Kaiyu Feng
Jie Gui
Shuliang Wang
Y. Ong
32
1
0
29 May 2024
BRACTIVE: A Brain Activation Approach to Human Visual Brain Learning
Xuan-Bac Nguyen
Hojin Jang
Xin Li
Samee U. Khan
Pawan Sinha
Khoa Luu
38
3
0
29 May 2024
mTREE: Multi-Level Text-Guided Representation End-to-End Learning for Whole Slide Image Analysis
Quan Liu
Ruining Deng
Can Cui
Tianyuan Yao
V. Nath
Yucheng Tang
Yuankai Huo
32
0
0
28 May 2024
Do Vision-Language Transformers Exhibit Visual Commonsense? An Empirical Study of VCR
Zhenyang Li
Yangyang Guo
Ke-Jyun Wang
Xiaolin Chen
Liqiang Nie
Mohan S. Kankanhalli
LRM
23
8
0
27 May 2024
Think Before You Act: A Two-Stage Framework for Mitigating Gender Bias Towards Vision-Language Tasks
Yunqi Zhang
Songda Li
Chunyuan Deng
Luyi Wang
Hui Zhao
29
0
0
27 May 2024
Prompt-Aware Adapter: Towards Learning Adaptive Visual Tokens for Multimodal Large Language Models
Yue Zhang
Hehe Fan
Yi Yang
43
3
0
24 May 2024
AlignGPT: Multi-modal Large Language Models with Adaptive Alignment Capability
Fei Zhao
Taotian Pang
Chunhui Li
Zhen Wu
Junjie Guo
Shangyu Xing
Xinyu Dai
47
7
0
23 May 2024
Towards Retrieval-Augmented Architectures for Image Captioning
Sara Sarto
Marcella Cornia
Lorenzo Baraldi
Alessandro Nicolosi
Rita Cucchiara
VLM
19
9
0
21 May 2024
Like Humans to Few-Shot Learning through Knowledge Permeation of Vision and Text
Yuyu Jia
Qing Zhou
Wei Huang
Junyu Gao
Qi. Wang
VLM
22
1
0
21 May 2024
Predicting and Explaining Hearing Aid Usage Using Encoder-Decoder with Attention Mechanism and SHAP
Qiqi Su
Eleftheria Iliadou
19
1
0
18 May 2024
Automated Radiology Report Generation: A Review of Recent Advances
Phillip Sloan
Philip Clatworthy
Edwin Simpson
Majid Mirmehdi
30
17
0
17 May 2024
Faithful Attention Explainer: Verbalizing Decisions Based on Discriminative Features
Yao Rong
David Scheerer
Enkelejda Kasneci
40
0
0
16 May 2024
Spatial Semantic Recurrent Mining for Referring Image Segmentation
Jiaxing Yang
Lihe Zhang
Jiayu Sun
Huchuan Lu
21
0
0
15 May 2024
CSA-Net: Channel-wise Spatially Autocorrelated Attention Networks
Nick Nikzad
Yongsheng Gao
Jun Zhou
21
0
0
09 May 2024
Temporal and Heterogeneous Graph Neural Network for Remaining Useful Life Prediction
Zhihao Wen
Yuan Fang
Pengcheng Wei
Fayao Liu
Zhenghua Chen
Min-man Wu
AI4CE
22
2
0
07 May 2024
DVMSR: Distillated Vision Mamba for Efficient Super-Resolution
Xiaoyan Lei
Wenlong Zhang
Weifeng Cao
27
11
0
05 May 2024
SalFAU-Net: Saliency Fusion Attention U-Net for Salient Object Detection
Kassaw Abraham Mulat
Zhengyong Feng
Tegegne Solomon Eshetie
Ahmed Endris Hasen
31
0
0
05 May 2024
Explainable Interface for Human-Autonomy Teaming: A Survey
Xiangqi Kong
Yang Xing
Antonios Tsourdos
Ziyue Wang
Weisi Guo
Adolfo Perrusquía
Andreas Wikander
35
3
0
04 May 2024
Leveraging the Human Ventral Visual Stream to Improve Neural Network Robustness
Zhenan Shao
Linjian Ma
Bo Li
Diane M. Beck
AAML
31
3
0
04 May 2024
FITA: Fine-grained Image-Text Aligner for Radiology Report Generation
Honglong Yang
Hui Tang
Xiaomeng Li
MedIm
28
1
0
02 May 2024
Semi-supervised Text-based Person Search
Daming Gao
Yang Bai
Min Cao
Hao Dou
Mang Ye
Min Zhang
39
1
0
28 Apr 2024
Pre-training on High Definition X-ray Images: An Experimental Study
Xiao Wang
Yuehang Li
Wentao Wu
Jiandong Jin
Yao Rong
Bowei Jiang
Chuanfu Li
Jin Tang
MedIm
ViT
LM&MA
36
3
0
27 Apr 2024
SERPENT-VLM : Self-Refining Radiology Report Generation Using Vision Language Models
M. Kapadnis
Sohan Patnaik
Abhilash Nandy
Sourjyadip Ray
Pawan Goyal
Debdoot Sheet
VLM
27
3
0
27 Apr 2024
From Cognition to Computation: A Comparative Review of Human Attention and Transformer Architectures
Minglu Zhao
Dehong Xu
Tao Gao
40
4
0
25 Apr 2024
Understanding attention-based encoder-decoder networks: a case study with chess scoresheet recognition
Sergio Y. Hayashi
N. Hirata
43
0
0
23 Apr 2024
Sentiment-oriented Transformer-based Variational Autoencoder Network for Live Video Commenting
Fengyi Fu
Shancheng Fang
Weidong Chen
Zhendong Mao
ViT
VGen
26
4
0
19 Apr 2024
Resilience through Scene Context in Visual Referring Expression Generation
Simeon Junker
Sina Zarrieß
22
0
0
18 Apr 2024
Towards a Foundation Model for Partial Differential Equations: Multi-Operator Learning and Extrapolation
Jingmin Sun
Yuxuan Liu
Zecheng Zhang
Hayden Schaeffer
AI4CE
28
14
0
18 Apr 2024
HANet: A Hierarchical Attention Network for Change Detection With Bitemporal Very-High-Resolution Remote Sensing Images
Chengxi Han
Chen Wu
Haonan Guo
Meiqi Hu
Hongruixuan Chen
23
88
0
14 Apr 2024
StreakNet-Arch: An Anti-scattering Network-based Architecture for Underwater Carrier LiDAR-Radar Imaging
Xuelong Li
Hongjun An
Guangying Li
Xing Wang
Guanghua Cheng
Zhe Sun
36
0
0
14 Apr 2024
Enhancing Visual Question Answering through Question-Driven Image Captions as Prompts
Övgü Özdemir
Erdem Akagündüz
36
10
0
12 Apr 2024
A Mutual Inclusion Mechanism for Precise Boundary Segmentation in Medical Images
Yizhi Pan
Junyi Xin
Tianhua Yang
Teeradaj Racharak
Le-Minh Nguyen
Guanqun Sun
19
3
0
12 Apr 2024
FLoRA: Enhancing Vision-Language Models with Parameter-Efficient Federated Learning
Duy Phuong Nguyen
J. P. Muñoz
Ali Jannesari
VLM
29
6
0
12 Apr 2024
Exploring the Necessity of Visual Modality in Multimodal Machine Translation using Authentic Datasets
Zi Long
Zhenhao Tang
Xianghua Fu
Jian Chen
Shilong Hou
Jinze Lyu
34
2
0
09 Apr 2024
Panoptic Perception: A Novel Task and Fine-grained Dataset for Universal Remote Sensing Image Interpretation
Danpei Zhao
Bo Yuan
Ziqiang Chen
Tian Li
Zhuoran Liu
Wentao Li
Yue Gao
39
10
0
06 Apr 2024
A Bi-consolidating Model for Joint Relational Triple Extraction
Xiaocheng Luo
Yanping Chen
Ruixue Tang
Caiwei Yang
Ruizhang Huang
Yongbin Qin
35
0
0
05 Apr 2024
AdaGlimpse: Active Visual Exploration with Arbitrary Glimpse Position and Scale
Adam Pardyl
Michal Wronka
Maciej Wolczyk
Kamil Adamczewski
Tomasz Trzciñski
Bartosz Zieliñski
33
2
0
04 Apr 2024
Memory-based Cross-modal Semantic Alignment Network for Radiology Report Generation
Yitian Tao
Liyan Ma
Jing Yu
Han Zhang
MedIm
28
6
0
31 Mar 2024
Enhancing Efficiency in Vision Transformer Networks: Design Techniques and Insights
Moein Heidari
Reza Azad
Sina Ghorbani Kolahi
René Arimond
Leon Niggemeier
...
Afshin Bozorgpour
Ehsan Khodapanah Aghdam
A. Kazerouni
I. Hacihaliloglu
Dorit Merhof
41
7
0
28 Mar 2024
De-confounded Data-free Knowledge Distillation for Handling Distribution Shifts
Yuzheng Wang
Dingkang Yang
Zhaoyu Chen
Yang Liu
Siao Liu
Wenqiang Zhang
Lihua Zhang
Lizhe Qi
32
6
0
28 Mar 2024
Text Data-Centric Image Captioning with Interactive Prompts
Yiyu Wang
Hao Luo
Jungang Xu
Yingfei Sun
Fan Wang
VLM
30
0
0
28 Mar 2024
Semi-Supervised Image Captioning Considering Wasserstein Graph Matching
Yang Yang
36
0
0
26 Mar 2024
Selectively Informative Description can Reduce Undesired Embedding Entanglements in Text-to-Image Personalization
Jimyeong Kim
Jungwon Park
Wonjong Rhee
DiffM
30
5
0
22 Mar 2024
TiBiX: Leveraging Temporal Information for Bidirectional X-ray and Report Generation
Santosh Sanjeev
F. Maani
Arsen Abzhanov
Vijay Ram Papineni
Ibrahim Almakky
Bartlomiej W. Papie.z
Mohammad Yaqub
MedIm
58
0
0
20 Mar 2024
HyperFusion: A Hypernetwork Approach to Multimodal Integration of Tabular and Medical Imaging Data for Predictive Modeling
Daniel Duenias
Brennan Nichyporuk
Tal Arbel
Tammy Riklin-Raviv
34
3
0
20 Mar 2024
Training A Small Emotional Vision Language Model for Visual Art Comprehension
Jing Zhang
Liang Zheng
Meng Wang
Dan Guo
VLM
22
4
0
17 Mar 2024
LuoJiaHOG: A Hierarchy Oriented Geo-aware Image Caption Dataset for Remote Sensing Image-Text Retrival
Yuanxin Zhao
Mi Zhang
Bingnan Yang
Zhan Zhang
Jiaju Kang
Jianya Gong
30
2
0
16 Mar 2024
Select and Distill: Selective Dual-Teacher Knowledge Transfer for Continual Learning on Vision-Language Models
Yu-Chu Yu
Chi-Pin Huang
Jr-Jen Chen
Kai-Po Chang
Yung-Hsuan Lai
Fu-En Yang
Yu-Chiang Frank Wang
CLL
VLM
37
7
0
14 Mar 2024
Previous
1
2
3
4
5
...
69
70
71
Next