ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1502.03044
  4. Cited By
Show, Attend and Tell: Neural Image Caption Generation with Visual
  Attention
v1v2v3 (latest)

Show, Attend and Tell: Neural Image Caption Generation with Visual Attention

10 February 2015
Ke Xu
Jimmy Ba
Ryan Kiros
Dong Wang
Aaron Courville
Ruslan Salakhutdinov
R. Zemel
Yoshua Bengio
    DiffM
ArXiv (abs)PDFHTML

Papers citing "Show, Attend and Tell: Neural Image Caption Generation with Visual Attention"

50 / 3,580 papers shown
Towards Context-Aware Emotion Recognition Debiasing from a Causal
  Demystification Perspective via De-confounded Training
Towards Context-Aware Emotion Recognition Debiasing from a Causal Demystification Perspective via De-confounded Training
Dingkang Yang
Kun Yang
Haopeng Kuang
Zhaoyu Chen
Yuzheng Wang
Lihua Zhang
CML
199
14
0
06 Jul 2024
Explainable Image Captioning using CNN- CNN architecture and
  Hierarchical Attention
Explainable Image Captioning using CNN- CNN architecture and Hierarchical Attention
Rishi Mohan
Sanjay Sureshkumar
Vignesh Sivasubramaniam
153
2
0
28 Jun 2024
Analyzing Quality, Bias, and Performance in Text-to-Image Generative
  Models
Analyzing Quality, Bias, and Performance in Text-to-Image Generative Models
Nila Masrourisaadat
Nazanin Sedaghatkish
Fatemeh Sarshartehrani
Edward A. Fox
347
13
0
28 Jun 2024
Brain Tumor Classification using Vision Transformer with Selective
  Cross-Attention Mechanism and Feature Calibration
Brain Tumor Classification using Vision Transformer with Selective Cross-Attention Mechanism and Feature Calibration
M. Khaniki
Alireza Golkarieh
Mohammad Manthouri
MedIm
173
9
0
25 Jun 2024
Enhancing Scientific Figure Captioning Through Cross-modal Learning
Enhancing Scientific Figure Captioning Through Cross-modal Learning
Mateo Alejandro Rojas
Rafael Carranza
194
0
0
24 Jun 2024
Reading Is Believing: Revisiting Language Bottleneck Models for Image
  Classification
Reading Is Believing: Revisiting Language Bottleneck Models for Image Classification
Honori Udo
Takafumi Koshinaka
VLM
184
0
0
22 Jun 2024
A Data-Driven Guided Decoding Mechanism for Diagnostic Captioning
A Data-Driven Guided Decoding Mechanism for Diagnostic Captioning
Panagiotis Kaliosis
John Pavlopoulos
Foivos Charalampakos
Georgios Moschovis
Ion Androutsopoulos
MedIm
148
5
0
20 Jun 2024
Using Multimodal Large Language Models for Automated Detection of
  Traffic Safety Critical Events
Using Multimodal Large Language Models for Automated Detection of Traffic Safety Critical Events
M. Tami
Huthaifa I. Ashqar
Mohammed Elhenawy
276
8
0
19 Jun 2024
DDLNet: Boosting Remote Sensing Change Detection with Dual-Domain
  Learning
DDLNet: Boosting Remote Sensing Change Detection with Dual-Domain LearningIEEE International Conference on Multimedia and Expo (ICME), 2024
Xiaowen Ma
Jiawei Yang
Rui Che
Huanting Zhang
Wei Zhang
184
21
0
19 Jun 2024
M3T: Multi-Modal Medical Transformer to bridge Clinical Context with
  Visual Insights for Retinal Image Medical Description Generation
M3T: Multi-Modal Medical Transformer to bridge Clinical Context with Visual Insights for Retinal Image Medical Description GenerationInternational Conference on Information Photonics (ICIP), 2024
Nagur Shareef Shaik
T. Cherukuri
Dong Hye Ye
MedIm
265
2
0
19 Jun 2024
Improving Large Models with Small models: Lower Costs and Better
  Performance
Improving Large Models with Small models: Lower Costs and Better Performance
Dong Chen
Shuo Zhang
Yueting Zhuang
Siliang Tang
Qidong Liu
Hua Wang
Mingliang Xu
208
12
0
15 Jun 2024
Large Language Models Meet Text-Centric Multimodal Sentiment Analysis: A
  Survey
Large Language Models Meet Text-Centric Multimodal Sentiment Analysis: A Survey
Hao Yang
Yanyan Zhao
Yang Wu
Shilong Wang
Tian Zheng
Hongbo Zhang
Zongyang Ma
Wanxiang Che
Bing Qin
351
35
0
12 Jun 2024
Stealthy Targeted Backdoor Attacks against Image Captioning
Stealthy Targeted Backdoor Attacks against Image CaptioningIEEE Transactions on Information Forensics and Security (IEEE TIFS), 2024
Wenshu Fan
Hongwei Li
Wenbo Jiang
Meng Hao
Shui Yu
Xiao Zhang
DiffM
235
15
0
09 Jun 2024
Understanding Retrieval Robustness for Retrieval-Augmented Image
  Captioning
Understanding Retrieval Robustness for Retrieval-Augmented Image Captioning
Wenyan Li
Jiaang Li
R. Ramos
Raphael Tang
Desmond Elliott
VLM
310
9
0
04 Jun 2024
CODE: Contrasting Self-generated Description to Combat Hallucination in
  Large Multi-modal Models
CODE: Contrasting Self-generated Description to Combat Hallucination in Large Multi-modal Models
Junho Kim
Hyunjun Kim
Yeonju Kim
Yong Man Ro
MLLM
222
31
0
04 Jun 2024
Story Generation from Visual Inputs: Techniques, Related Tasks, and Challenges
Story Generation from Visual Inputs: Techniques, Related Tasks, and Challenges
Daniel A. P. Oliveira
Eugénio Ribeiro
David Martins de Matos
VGen
228
4
0
04 Jun 2024
Ultrasound Report Generation with Cross-Modality Feature Alignment via
  Unsupervised Guidance
Ultrasound Report Generation with Cross-Modality Feature Alignment via Unsupervised Guidance
Jun Li
Tongkun Su
Baoliang Zhao
Faqin Lv
Qiong Wang
Nassir Navab
Yin Hu
Zhongliang Jiang
MedIm
230
17
0
02 Jun 2024
Image Captioning via Dynamic Path Customization
Image Captioning via Dynamic Path Customization
Yiwei Ma
Jiayi Ji
Xiaoshuai Sun
Weihao Ye
Xiaopeng Hong
Yongjian Wu
Rongrong Ji
258
10
0
01 Jun 2024
DeCo: Decoupling Token Compression from Semantic Abstraction in
  Multimodal Large Language Models
DeCo: Decoupling Token Compression from Semantic Abstraction in Multimodal Large Language Models
Linli Yao
Lei Li
Shuhuai Ren
Lean Wang
Yuanxin Liu
Xu Sun
Lu Hou
216
59
0
31 May 2024
CoSy: Evaluating Textual Explanations of Neurons
CoSy: Evaluating Textual Explanations of Neurons
Laura Kopf
P. Bommer
Anna Hedström
Sebastian Lapuschkin
Marina M.-C. Höhne
Kirill Bykov
204
19
0
30 May 2024
Source Code Foundation Models are Transferable Binary Analysis Knowledge
  Bases
Source Code Foundation Models are Transferable Binary Analysis Knowledge Bases
Zian Su
Xiangzhe Xu
Ziyang Huang
Kaiyuan Zhang
Xiangyu Zhang
193
10
0
30 May 2024
SIG: Efficient Self-Interpretable Graph Neural Network for
  Continuous-time Dynamic Graphs
SIG: Efficient Self-Interpretable Graph Neural Network for Continuous-time Dynamic Graphs
Lanting Fang
Yulian Yang
Kai Wang
Shanshan Feng
Kaiyu Feng
Jie Gui
Shuliang Wang
Yew-Soon Ong
244
2
0
29 May 2024
BRACTIVE: A Brain Activation Approach to Human Visual Brain Learning
BRACTIVE: A Brain Activation Approach to Human Visual Brain Learning
Xuan-Bac Nguyen
Hojin Jang
Pawan Sinha
Samee U. Khan
Arabinda Kumar Choudhary
Khoa Luu
449
6
0
29 May 2024
mTREE: Multi-Level Text-Guided Representation End-to-End Learning for
  Whole Slide Image Analysis
mTREE: Multi-Level Text-Guided Representation End-to-End Learning for Whole Slide Image Analysis
Quan Liu
Ruining Deng
Can Cui
Tianyuan Yao
V. Nath
Yucheng Tang
Yuankai Huo
180
1
0
28 May 2024
Do Vision-Language Transformers Exhibit Visual Commonsense? An Empirical
  Study of VCR
Do Vision-Language Transformers Exhibit Visual Commonsense? An Empirical Study of VCR
Zhenyang Li
Yangyang Guo
Ke-Jyun Wang
Xiaolin Chen
Liqiang Nie
Mohan S. Kankanhalli
LRM
227
14
0
27 May 2024
Think Before You Act: A Two-Stage Framework for Mitigating Gender Bias
  Towards Vision-Language Tasks
Think Before You Act: A Two-Stage Framework for Mitigating Gender Bias Towards Vision-Language Tasks
Yunqi Zhang
Songda Li
Chunyuan Deng
Luyi Wang
Hui Zhao
336
0
0
27 May 2024
Prompt-Aware Adapter: Towards Learning Adaptive Visual Tokens for
  Multimodal Large Language Models
Prompt-Aware Adapter: Towards Learning Adaptive Visual Tokens for Multimodal Large Language Models
Yue Zhang
Hehe Fan
Yi Yang
289
6
0
24 May 2024
AlignGPT: Multi-modal Large Language Models with Adaptive Alignment
  Capability
AlignGPT: Multi-modal Large Language Models with Adaptive Alignment Capability
Fei Zhao
Taotian Pang
Chunhui Li
Zhen Wu
Junjie Guo
Shangyu Xing
Xinyu Dai
184
12
0
23 May 2024
Towards Retrieval-Augmented Architectures for Image Captioning
Towards Retrieval-Augmented Architectures for Image Captioning
Sara Sarto
Marcella Cornia
Lorenzo Baraldi
Alessandro Nicolosi
Rita Cucchiara
VLM
241
19
0
21 May 2024
Like Humans to Few-Shot Learning through Knowledge Permeation of Vision
  and Text
Like Humans to Few-Shot Learning through Knowledge Permeation of Vision and Text
Yuyu Jia
Qing Zhou
Wei Huang
Junyu Gao
Qi. Wang
VLM
275
2
0
21 May 2024
Predicting and Explaining Hearing Aid Usage Using Encoder-Decoder with
  Attention Mechanism and SHAP
Predicting and Explaining Hearing Aid Usage Using Encoder-Decoder with Attention Mechanism and SHAPInternational Conference on Signal-Image Technology and Internet-Based Systems (SITIS), 2022
Qiqi Su
Eleftheria Iliadou
133
1
0
18 May 2024
Automated Radiology Report Generation: A Review of Recent Advances
Automated Radiology Report Generation: A Review of Recent AdvancesIEEE Reviews in Biomedical Engineering (RBME), 2024
Phillip Sloan
Philip Clatworthy
Edwin Simpson
Majid Mirmehdi
252
63
0
17 May 2024
Faithful Attention Explainer: Verbalizing Decisions Based on
  Discriminative Features
Faithful Attention Explainer: Verbalizing Decisions Based on Discriminative Features
Yao Rong
David Scheerer
Enkelejda Kasneci
266
1
0
16 May 2024
Spatial Semantic Recurrent Mining for Referring Image Segmentation
Spatial Semantic Recurrent Mining for Referring Image Segmentation
Jiaxing Yang
Lihe Zhang
Jiayu Sun
Huchuan Lu
303
1
0
15 May 2024
CSA-Net: Channel-wise Spatially Autocorrelated Attention Networks
CSA-Net: Channel-wise Spatially Autocorrelated Attention Networks
Nick Nikzad
Yongsheng Gao
Jun Zhou
236
2
0
09 May 2024
Temporal and Heterogeneous Graph Neural Network for Remaining Useful Life Prediction
Temporal and Heterogeneous Graph Neural Network for Remaining Useful Life Prediction
Zhihao Wen
Yuan Fang
Pengcheng Wei
Fayao Liu
Zhenghua Chen
Ruibing Jin
AI4CE
285
11
0
07 May 2024
DVMSR: Distillated Vision Mamba for Efficient Super-Resolution
DVMSR: Distillated Vision Mamba for Efficient Super-Resolution
Xiaoyan Lei
Wenlong Zhang
Weifeng Cao
397
32
0
05 May 2024
SalFAU-Net: Saliency Fusion Attention U-Net for Salient Object Detection
SalFAU-Net: Saliency Fusion Attention U-Net for Salient Object Detection
Kassaw Abraham Mulat
Zhengyong Feng
Tegegne Solomon Eshetie
Ahmed Endris Hasen
211
2
0
05 May 2024
Explainable Interface for Human-Autonomy Teaming: A Survey
Explainable Interface for Human-Autonomy Teaming: A Survey
Xiangqi Kong
Yang Xing
Antonios Tsourdos
Ziyue Wang
Weisi Guo
Adolfo Perrusquía
Andreas Wikander
269
8
0
04 May 2024
FITA: Fine-grained Image-Text Aligner for Radiology Report Generation
FITA: Fine-grained Image-Text Aligner for Radiology Report Generation
Honglong Yang
Hui Tang
Xiaomeng Li
MedIm
207
3
0
02 May 2024
Enhanced Textual Feature Extraction for Visual Question Answering: A Simple Convolutional Approach
Enhanced Textual Feature Extraction for Visual Question Answering: A Simple Convolutional Approach
Zhilin Zhang
Fangyu Wu
201
0
0
01 May 2024
Semi-supervised Text-based Person Search
Semi-supervised Text-based Person Search
Daming Gao
Yang Bai
Min Cao
Hao Dou
Mang Ye
Min Zhang
216
2
0
28 Apr 2024
Pre-training on High Definition X-ray Images: An Experimental Study
Pre-training on High Definition X-ray Images: An Experimental Study
Tianlin Li
Yuehang Li
Wentao Wu
Jiandong Jin
Yao Rong
Bowei Jiang
Chuanfu Li
Jin Tang
MedImViTLM&MA
270
6
0
27 Apr 2024
SERPENT-VLM : Self-Refining Radiology Report Generation Using Vision
  Language Models
SERPENT-VLM : Self-Refining Radiology Report Generation Using Vision Language Models
M. Kapadnis
Sohan Patnaik
Abhilash Nandy
Sourjyadip Ray
Pawan Goyal
Debdoot Sheet
VLM
180
16
0
27 Apr 2024
From Cognition to Computation: A Comparative Review of Human Attention
  and Transformer Architectures
From Cognition to Computation: A Comparative Review of Human Attention and Transformer Architectures
Minglu Zhao
Dehong Xu
Tao Gao
126
7
0
25 Apr 2024
Understanding attention-based encoder-decoder networks: a case study
  with chess scoresheet recognition
Understanding attention-based encoder-decoder networks: a case study with chess scoresheet recognition
Sergio Y. Hayashi
N. Hirata
199
0
0
23 Apr 2024
Sentiment-oriented Transformer-based Variational Autoencoder Network for
  Live Video Commenting
Sentiment-oriented Transformer-based Variational Autoencoder Network for Live Video Commenting
Fengyi Fu
Shancheng Fang
Weidong Chen
Zhendong Mao
ViTVGen
179
5
0
19 Apr 2024
Resilience through Scene Context in Visual Referring Expression
  Generation
Resilience through Scene Context in Visual Referring Expression Generation
Simeon Junker
Sina Zarrieß
132
4
0
18 Apr 2024
Towards a Foundation Model for Partial Differential Equations: Multi-Operator Learning and Extrapolation
Towards a Foundation Model for Partial Differential Equations: Multi-Operator Learning and Extrapolation
Jingmin Sun
Yuxuan Liu
Zecheng Zhang
Hayden Schaeffer
AI4CE
406
39
0
18 Apr 2024
HANet: A Hierarchical Attention Network for Change Detection With
  Bitemporal Very-High-Resolution Remote Sensing Images
HANet: A Hierarchical Attention Network for Change Detection With Bitemporal Very-High-Resolution Remote Sensing Images
Chengxi Han
Chen Wu
Haonan Guo
Meiqi Hu
Hongruixuan Chen
262
164
0
14 Apr 2024
Previous
123456...707172
Next