ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1502.03044
  4. Cited By
Show, Attend and Tell: Neural Image Caption Generation with Visual
  Attention

Show, Attend and Tell: Neural Image Caption Generation with Visual Attention

10 February 2015
Ke Xu
Jimmy Ba
Ryan Kiros
Kyunghyun Cho
Aaron Courville
Ruslan Salakhutdinov
R. Zemel
Yoshua Bengio
    DiffM
ArXivPDFHTML

Papers citing "Show, Attend and Tell: Neural Image Caption Generation with Visual Attention"

50 / 3,507 papers shown
Title
CoSy: Evaluating Textual Explanations of Neurons
CoSy: Evaluating Textual Explanations of Neurons
Laura Kopf
P. Bommer
Anna Hedström
Sebastian Lapuschkin
Marina M.-C. Höhne
Kirill Bykov
44
7
0
30 May 2024
Source Code Foundation Models are Transferable Binary Analysis Knowledge
  Bases
Source Code Foundation Models are Transferable Binary Analysis Knowledge Bases
Zian Su
Xiangzhe Xu
Ziyang Huang
Kaiyuan Zhang
Xiangyu Zhang
32
5
0
30 May 2024
SIG: Efficient Self-Interpretable Graph Neural Network for
  Continuous-time Dynamic Graphs
SIG: Efficient Self-Interpretable Graph Neural Network for Continuous-time Dynamic Graphs
Lanting Fang
Yulian Yang
Kai Wang
Shanshan Feng
Kaiyu Feng
Jie Gui
Shuliang Wang
Y. Ong
32
1
0
29 May 2024
BRACTIVE: A Brain Activation Approach to Human Visual Brain Learning
BRACTIVE: A Brain Activation Approach to Human Visual Brain Learning
Xuan-Bac Nguyen
Hojin Jang
Xin Li
Samee U. Khan
Pawan Sinha
Khoa Luu
38
3
0
29 May 2024
mTREE: Multi-Level Text-Guided Representation End-to-End Learning for
  Whole Slide Image Analysis
mTREE: Multi-Level Text-Guided Representation End-to-End Learning for Whole Slide Image Analysis
Quan Liu
Ruining Deng
Can Cui
Tianyuan Yao
V. Nath
Yucheng Tang
Yuankai Huo
32
0
0
28 May 2024
Do Vision-Language Transformers Exhibit Visual Commonsense? An Empirical
  Study of VCR
Do Vision-Language Transformers Exhibit Visual Commonsense? An Empirical Study of VCR
Zhenyang Li
Yangyang Guo
Ke-Jyun Wang
Xiaolin Chen
Liqiang Nie
Mohan S. Kankanhalli
LRM
23
8
0
27 May 2024
Think Before You Act: A Two-Stage Framework for Mitigating Gender Bias
  Towards Vision-Language Tasks
Think Before You Act: A Two-Stage Framework for Mitigating Gender Bias Towards Vision-Language Tasks
Yunqi Zhang
Songda Li
Chunyuan Deng
Luyi Wang
Hui Zhao
29
0
0
27 May 2024
Prompt-Aware Adapter: Towards Learning Adaptive Visual Tokens for
  Multimodal Large Language Models
Prompt-Aware Adapter: Towards Learning Adaptive Visual Tokens for Multimodal Large Language Models
Yue Zhang
Hehe Fan
Yi Yang
43
3
0
24 May 2024
AlignGPT: Multi-modal Large Language Models with Adaptive Alignment
  Capability
AlignGPT: Multi-modal Large Language Models with Adaptive Alignment Capability
Fei Zhao
Taotian Pang
Chunhui Li
Zhen Wu
Junjie Guo
Shangyu Xing
Xinyu Dai
47
7
0
23 May 2024
Towards Retrieval-Augmented Architectures for Image Captioning
Towards Retrieval-Augmented Architectures for Image Captioning
Sara Sarto
Marcella Cornia
Lorenzo Baraldi
Alessandro Nicolosi
Rita Cucchiara
VLM
19
9
0
21 May 2024
Like Humans to Few-Shot Learning through Knowledge Permeation of Vision
  and Text
Like Humans to Few-Shot Learning through Knowledge Permeation of Vision and Text
Yuyu Jia
Qing Zhou
Wei Huang
Junyu Gao
Qi. Wang
VLM
22
1
0
21 May 2024
Predicting and Explaining Hearing Aid Usage Using Encoder-Decoder with
  Attention Mechanism and SHAP
Predicting and Explaining Hearing Aid Usage Using Encoder-Decoder with Attention Mechanism and SHAP
Qiqi Su
Eleftheria Iliadou
19
1
0
18 May 2024
Automated Radiology Report Generation: A Review of Recent Advances
Automated Radiology Report Generation: A Review of Recent Advances
Phillip Sloan
Philip Clatworthy
Edwin Simpson
Majid Mirmehdi
30
17
0
17 May 2024
Faithful Attention Explainer: Verbalizing Decisions Based on
  Discriminative Features
Faithful Attention Explainer: Verbalizing Decisions Based on Discriminative Features
Yao Rong
David Scheerer
Enkelejda Kasneci
40
0
0
16 May 2024
Spatial Semantic Recurrent Mining for Referring Image Segmentation
Spatial Semantic Recurrent Mining for Referring Image Segmentation
Jiaxing Yang
Lihe Zhang
Jiayu Sun
Huchuan Lu
21
0
0
15 May 2024
CSA-Net: Channel-wise Spatially Autocorrelated Attention Networks
CSA-Net: Channel-wise Spatially Autocorrelated Attention Networks
Nick Nikzad
Yongsheng Gao
Jun Zhou
21
0
0
09 May 2024
Temporal and Heterogeneous Graph Neural Network for Remaining Useful
  Life Prediction
Temporal and Heterogeneous Graph Neural Network for Remaining Useful Life Prediction
Zhihao Wen
Yuan Fang
Pengcheng Wei
Fayao Liu
Zhenghua Chen
Min-man Wu
AI4CE
22
2
0
07 May 2024
DVMSR: Distillated Vision Mamba for Efficient Super-Resolution
DVMSR: Distillated Vision Mamba for Efficient Super-Resolution
Xiaoyan Lei
Wenlong Zhang
Weifeng Cao
27
11
0
05 May 2024
SalFAU-Net: Saliency Fusion Attention U-Net for Salient Object Detection
SalFAU-Net: Saliency Fusion Attention U-Net for Salient Object Detection
Kassaw Abraham Mulat
Zhengyong Feng
Tegegne Solomon Eshetie
Ahmed Endris Hasen
31
0
0
05 May 2024
Explainable Interface for Human-Autonomy Teaming: A Survey
Explainable Interface for Human-Autonomy Teaming: A Survey
Xiangqi Kong
Yang Xing
Antonios Tsourdos
Ziyue Wang
Weisi Guo
Adolfo Perrusquía
Andreas Wikander
35
3
0
04 May 2024
Leveraging the Human Ventral Visual Stream to Improve Neural Network
  Robustness
Leveraging the Human Ventral Visual Stream to Improve Neural Network Robustness
Zhenan Shao
Linjian Ma
Bo Li
Diane M. Beck
AAML
31
3
0
04 May 2024
FITA: Fine-grained Image-Text Aligner for Radiology Report Generation
FITA: Fine-grained Image-Text Aligner for Radiology Report Generation
Honglong Yang
Hui Tang
Xiaomeng Li
MedIm
28
1
0
02 May 2024
Semi-supervised Text-based Person Search
Semi-supervised Text-based Person Search
Daming Gao
Yang Bai
Min Cao
Hao Dou
Mang Ye
Min Zhang
39
1
0
28 Apr 2024
Pre-training on High Definition X-ray Images: An Experimental Study
Pre-training on High Definition X-ray Images: An Experimental Study
Xiao Wang
Yuehang Li
Wentao Wu
Jiandong Jin
Yao Rong
Bowei Jiang
Chuanfu Li
Jin Tang
MedIm
ViT
LM&MA
36
3
0
27 Apr 2024
SERPENT-VLM : Self-Refining Radiology Report Generation Using Vision
  Language Models
SERPENT-VLM : Self-Refining Radiology Report Generation Using Vision Language Models
M. Kapadnis
Sohan Patnaik
Abhilash Nandy
Sourjyadip Ray
Pawan Goyal
Debdoot Sheet
VLM
27
3
0
27 Apr 2024
From Cognition to Computation: A Comparative Review of Human Attention
  and Transformer Architectures
From Cognition to Computation: A Comparative Review of Human Attention and Transformer Architectures
Minglu Zhao
Dehong Xu
Tao Gao
40
4
0
25 Apr 2024
Understanding attention-based encoder-decoder networks: a case study
  with chess scoresheet recognition
Understanding attention-based encoder-decoder networks: a case study with chess scoresheet recognition
Sergio Y. Hayashi
N. Hirata
43
0
0
23 Apr 2024
Sentiment-oriented Transformer-based Variational Autoencoder Network for
  Live Video Commenting
Sentiment-oriented Transformer-based Variational Autoencoder Network for Live Video Commenting
Fengyi Fu
Shancheng Fang
Weidong Chen
Zhendong Mao
ViT
VGen
26
4
0
19 Apr 2024
Resilience through Scene Context in Visual Referring Expression
  Generation
Resilience through Scene Context in Visual Referring Expression Generation
Simeon Junker
Sina Zarrieß
22
0
0
18 Apr 2024
Towards a Foundation Model for Partial Differential Equations: Multi-Operator Learning and Extrapolation
Towards a Foundation Model for Partial Differential Equations: Multi-Operator Learning and Extrapolation
Jingmin Sun
Yuxuan Liu
Zecheng Zhang
Hayden Schaeffer
AI4CE
28
14
0
18 Apr 2024
HANet: A Hierarchical Attention Network for Change Detection With
  Bitemporal Very-High-Resolution Remote Sensing Images
HANet: A Hierarchical Attention Network for Change Detection With Bitemporal Very-High-Resolution Remote Sensing Images
Chengxi Han
Chen Wu
Haonan Guo
Meiqi Hu
Hongruixuan Chen
23
88
0
14 Apr 2024
StreakNet-Arch: An Anti-scattering Network-based Architecture for
  Underwater Carrier LiDAR-Radar Imaging
StreakNet-Arch: An Anti-scattering Network-based Architecture for Underwater Carrier LiDAR-Radar Imaging
Xuelong Li
Hongjun An
Guangying Li
Xing Wang
Guanghua Cheng
Zhe Sun
36
0
0
14 Apr 2024
Enhancing Visual Question Answering through Question-Driven Image
  Captions as Prompts
Enhancing Visual Question Answering through Question-Driven Image Captions as Prompts
Övgü Özdemir
Erdem Akagündüz
36
10
0
12 Apr 2024
A Mutual Inclusion Mechanism for Precise Boundary Segmentation in
  Medical Images
A Mutual Inclusion Mechanism for Precise Boundary Segmentation in Medical Images
Yizhi Pan
Junyi Xin
Tianhua Yang
Teeradaj Racharak
Le-Minh Nguyen
Guanqun Sun
19
3
0
12 Apr 2024
FLoRA: Enhancing Vision-Language Models with Parameter-Efficient
  Federated Learning
FLoRA: Enhancing Vision-Language Models with Parameter-Efficient Federated Learning
Duy Phuong Nguyen
J. P. Muñoz
Ali Jannesari
VLM
29
6
0
12 Apr 2024
Exploring the Necessity of Visual Modality in Multimodal Machine
  Translation using Authentic Datasets
Exploring the Necessity of Visual Modality in Multimodal Machine Translation using Authentic Datasets
Zi Long
Zhenhao Tang
Xianghua Fu
Jian Chen
Shilong Hou
Jinze Lyu
34
2
0
09 Apr 2024
Panoptic Perception: A Novel Task and Fine-grained Dataset for Universal
  Remote Sensing Image Interpretation
Panoptic Perception: A Novel Task and Fine-grained Dataset for Universal Remote Sensing Image Interpretation
Danpei Zhao
Bo Yuan
Ziqiang Chen
Tian Li
Zhuoran Liu
Wentao Li
Yue Gao
39
10
0
06 Apr 2024
A Bi-consolidating Model for Joint Relational Triple Extraction
A Bi-consolidating Model for Joint Relational Triple Extraction
Xiaocheng Luo
Yanping Chen
Ruixue Tang
Caiwei Yang
Ruizhang Huang
Yongbin Qin
35
0
0
05 Apr 2024
AdaGlimpse: Active Visual Exploration with Arbitrary Glimpse Position
  and Scale
AdaGlimpse: Active Visual Exploration with Arbitrary Glimpse Position and Scale
Adam Pardyl
Michal Wronka
Maciej Wolczyk
Kamil Adamczewski
Tomasz Trzciñski
Bartosz Zieliñski
33
2
0
04 Apr 2024
Memory-based Cross-modal Semantic Alignment Network for Radiology Report
  Generation
Memory-based Cross-modal Semantic Alignment Network for Radiology Report Generation
Yitian Tao
Liyan Ma
Jing Yu
Han Zhang
MedIm
28
6
0
31 Mar 2024
Enhancing Efficiency in Vision Transformer Networks: Design Techniques
  and Insights
Enhancing Efficiency in Vision Transformer Networks: Design Techniques and Insights
Moein Heidari
Reza Azad
Sina Ghorbani Kolahi
René Arimond
Leon Niggemeier
...
Afshin Bozorgpour
Ehsan Khodapanah Aghdam
A. Kazerouni
I. Hacihaliloglu
Dorit Merhof
41
7
0
28 Mar 2024
De-confounded Data-free Knowledge Distillation for Handling Distribution
  Shifts
De-confounded Data-free Knowledge Distillation for Handling Distribution Shifts
Yuzheng Wang
Dingkang Yang
Zhaoyu Chen
Yang Liu
Siao Liu
Wenqiang Zhang
Lihua Zhang
Lizhe Qi
32
6
0
28 Mar 2024
Text Data-Centric Image Captioning with Interactive Prompts
Text Data-Centric Image Captioning with Interactive Prompts
Yiyu Wang
Hao Luo
Jungang Xu
Yingfei Sun
Fan Wang
VLM
30
0
0
28 Mar 2024
Semi-Supervised Image Captioning Considering Wasserstein Graph Matching
Semi-Supervised Image Captioning Considering Wasserstein Graph Matching
Yang Yang
36
0
0
26 Mar 2024
Selectively Informative Description can Reduce Undesired Embedding
  Entanglements in Text-to-Image Personalization
Selectively Informative Description can Reduce Undesired Embedding Entanglements in Text-to-Image Personalization
Jimyeong Kim
Jungwon Park
Wonjong Rhee
DiffM
30
5
0
22 Mar 2024
TiBiX: Leveraging Temporal Information for Bidirectional X-ray and
  Report Generation
TiBiX: Leveraging Temporal Information for Bidirectional X-ray and Report Generation
Santosh Sanjeev
F. Maani
Arsen Abzhanov
Vijay Ram Papineni
Ibrahim Almakky
Bartlomiej W. Papie.z
Mohammad Yaqub
MedIm
58
0
0
20 Mar 2024
HyperFusion: A Hypernetwork Approach to Multimodal Integration of Tabular and Medical Imaging Data for Predictive Modeling
HyperFusion: A Hypernetwork Approach to Multimodal Integration of Tabular and Medical Imaging Data for Predictive Modeling
Daniel Duenias
Brennan Nichyporuk
Tal Arbel
Tammy Riklin-Raviv
34
3
0
20 Mar 2024
Training A Small Emotional Vision Language Model for Visual Art
  Comprehension
Training A Small Emotional Vision Language Model for Visual Art Comprehension
Jing Zhang
Liang Zheng
Meng Wang
Dan Guo
VLM
22
4
0
17 Mar 2024
LuoJiaHOG: A Hierarchy Oriented Geo-aware Image Caption Dataset for
  Remote Sensing Image-Text Retrival
LuoJiaHOG: A Hierarchy Oriented Geo-aware Image Caption Dataset for Remote Sensing Image-Text Retrival
Yuanxin Zhao
Mi Zhang
Bingnan Yang
Zhan Zhang
Jiaju Kang
Jianya Gong
30
2
0
16 Mar 2024
Select and Distill: Selective Dual-Teacher Knowledge Transfer for
  Continual Learning on Vision-Language Models
Select and Distill: Selective Dual-Teacher Knowledge Transfer for Continual Learning on Vision-Language Models
Yu-Chu Yu
Chi-Pin Huang
Jr-Jen Chen
Kai-Po Chang
Yung-Hsuan Lai
Fu-En Yang
Yu-Chiang Frank Wang
CLL
VLM
37
7
0
14 Mar 2024
Previous
12345...697071
Next