ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1502.03044
  4. Cited By
Show, Attend and Tell: Neural Image Caption Generation with Visual
  Attention
v1v2v3 (latest)

Show, Attend and Tell: Neural Image Caption Generation with Visual Attention

10 February 2015
Ke Xu
Jimmy Ba
Ryan Kiros
Dong Wang
Aaron Courville
Ruslan Salakhutdinov
R. Zemel
Yoshua Bengio
    DiffM
ArXiv (abs)PDFHTML

Papers citing "Show, Attend and Tell: Neural Image Caption Generation with Visual Attention"

50 / 3,580 papers shown
Improving Face Recognition from Caption Supervision with Multi-Granular
  Contextual Feature Aggregation
Improving Face Recognition from Caption Supervision with Multi-Granular Contextual Feature Aggregation
Md Golam Moula Mehedi Hasan
Nasser M. Nasrabadi
CVBM
85
2
0
13 Aug 2023
Benign Shortcut for Debiasing: Fair Visual Recognition via Intervention
  with Shortcut Features
Benign Shortcut for Debiasing: Fair Visual Recognition via Intervention with Shortcut FeaturesACM Multimedia (ACM MM), 2023
Yi Zhang
Jitao Sang
Junyan Wang
Shihong Deng
Yaowei Wang
178
9
0
13 Aug 2023
Fine-tuning Multimodal LLMs to Follow Zero-shot Demonstrative
  Instructions
Fine-tuning Multimodal LLMs to Follow Zero-shot Demonstrative InstructionsInternational Conference on Learning Representations (ICLR), 2023
Juncheng Li
Kaihang Pan
Zhiqi Ge
Minghe Gao
Wei Ji
Wenqiao Zhang
Tat-Seng Chua
Siliang Tang
Hanwang Zhang
Yueting Zhuang
MLLM
284
89
0
08 Aug 2023
D-Score: A Synapse-Inspired Approach for Filter Pruning
D-Score: A Synapse-Inspired Approach for Filter Pruning
Doyoung Park
Jinsoo Kim
Ji-Min Nam
Jooyoung Chang
S. Park
102
0
0
08 Aug 2023
Asynchronous Evolution of Deep Neural Network Architectures
Asynchronous Evolution of Deep Neural Network ArchitecturesApplied Soft Computing (Appl. Soft Comput.), 2023
J. Liang
Hormoz Shahrzad
Risto Miikkulainen
310
1
0
08 Aug 2023
A Comprehensive Analysis of Real-World Image Captioning and Scene
  Identification
A Comprehensive Analysis of Real-World Image Captioning and Scene Identification
Sai Suprabhanu Nallapaneni
Subrahmanyam Konakanchi
194
2
0
05 Aug 2023
Frustratingly Easy Model Generalization by Dummy Risk Minimization
Frustratingly Easy Model Generalization by Dummy Risk Minimization
Juncheng Wang
Yongfeng Zhang
Xixu Hu
Shujun Wang
Xingxu Xie
213
3
0
04 Aug 2023
Reverse Stable Diffusion: What prompt was used to generate this image?
Reverse Stable Diffusion: What prompt was used to generate this image?Computer Vision and Image Understanding (CVIU), 2023
Florinel-Alin Croitoru
Vlad Hondru
Radu Tudor Ionescu
M. Shah
VLMDiffM
276
10
0
02 Aug 2023
Beyond Generic: Enhancing Image Captioning with Real-World Knowledge
  using Vision-Language Pre-Training Model
Beyond Generic: Enhancing Image Captioning with Real-World Knowledge using Vision-Language Pre-Training ModelACM Multimedia (ACM MM), 2023
Ka Leong Cheng
Wenpo Song
Zheng Ma
Wenhao Zhu
Zi-Yue Zhu
Jianbing Zhang
CLIPVLM
170
18
0
02 Aug 2023
EEG-based Cognitive Load Classification using Feature Masked
  Autoencoding and Emotion Transfer Learning
EEG-based Cognitive Load Classification using Feature Masked Autoencoding and Emotion Transfer LearningInternational Conference on Multimodal Interaction (ICMI), 2023
Dustin Pulver
Prithila Angkan
Paul Hungler
Ali Etemad
261
15
0
01 Aug 2023
Transferable Decoding with Visual Entities for Zero-Shot Image
  Captioning
Transferable Decoding with Visual Entities for Zero-Shot Image CaptioningIEEE International Conference on Computer Vision (ICCV), 2023
Junjie Fei
Teng Wang
Jinrui Zhang
Zhenyu He
Chengjie Wang
Feng Zheng
VLM
161
64
0
31 Jul 2023
Triple Correlations-Guided Label Supplementation for Unbiased Video
  Scene Graph Generation
Triple Correlations-Guided Label Supplementation for Unbiased Video Scene Graph GenerationACM Multimedia (ACM MM), 2023
Wenqing Wang
Kaifeng Gao
Yawei Luo
Tao Jiang
Fei Gao
Jian Shao
Jianwen Sun
Jun Xiao
226
5
0
30 Jul 2023
DRL4Route: A Deep Reinforcement Learning Framework for Pick-up and
  Delivery Route Prediction
DRL4Route: A Deep Reinforcement Learning Framework for Pick-up and Delivery Route PredictionKnowledge Discovery and Data Mining (KDD), 2023
Xiaowei Mao
Haomin Wen
Hengrui Zhang
Huaiyu Wan
Lixia Wu
Jianbin Zheng
Haoyuan Hu
Youfang Lin
AI4TS
248
17
0
30 Jul 2023
Synaptic Plasticity Models and Bio-Inspired Unsupervised Deep Learning:
  A Survey
Synaptic Plasticity Models and Bio-Inspired Unsupervised Deep Learning: A Survey
Gabriele Lagani
Fabrizio Falchi
Claudio Gennaro
Giuseppe Amato
AAML
236
8
0
30 Jul 2023
RSGPT: A Remote Sensing Vision Language Model and Benchmark
RSGPT: A Remote Sensing Vision Language Model and BenchmarkIsprs Journal of Photogrammetry and Remote Sensing (ISPRS J. Photogramm. Remote Sens.), 2023
Yuan Hu
Jianlong Yuan
Congcong Wen
Xiaonan Lu
Xiang Li
VLM
265
205
0
28 Jul 2023
Learning Multi-modal Representations by Watching Hundreds of Surgical Video Lectures
Learning Multi-modal Representations by Watching Hundreds of Surgical Video Lectures
Kun Yuan
V. Srivastav
Tong Yu
Joël L. Lavanchy
J. Marescaux
Pietro Mascagni
Nassir Navab
N. Padoy
691
44
0
27 Jul 2023
Fact-Checking of AI-Generated Reports
Fact-Checking of AI-Generated Reports
Razi Mahmood
Diego Machado Reyes
Ge Wang
Mannudeep Kalra
Pingkun Yan
MedIm
191
8
0
27 Jul 2023
On the Learning Dynamics of Attention Networks
On the Learning Dynamics of Attention NetworksEuropean Conference on Artificial Intelligence (ECAI), 2023
Rahul Vashisht
H. G. Ramaswamy
281
1
0
25 Jul 2023
Enhancing image captioning with depth information using a
  Transformer-based framework
Enhancing image captioning with depth information using a Transformer-based framework
Aya Mahmoud Ahmed
Mohamed Yousef
K. Hussain
Yousef B. Mahdy
ViT
207
5
0
24 Jul 2023
Actor-agnostic Multi-label Action Recognition with Multi-modal Query
Actor-agnostic Multi-label Action Recognition with Multi-modal Query
Anindya Mondal
Sauradip Nag
J. Prada
Xiatian Zhu
Anjan Dutta
253
14
0
20 Jul 2023
Class Attention to Regions of Lesion for Imbalanced Medical Image
  Recognition
Class Attention to Regions of Lesion for Imbalanced Medical Image RecognitionNeurocomputing (Neurocomputing), 2023
Jia-Xin Zhuang
Jiabin Cai
Jianguo Zhang
Wei-Shi Zheng
Ruixuan Wang
191
20
0
19 Jul 2023
Embedded Heterogeneous Attention Transformer for Cross-lingual Image
  Captioning
Embedded Heterogeneous Attention Transformer for Cross-lingual Image CaptioningIEEE transactions on multimedia (IEEE TMM), 2023
Zijie Song
Zhenzhen Hu
Yuanen Zhou
Ye Zhao
Richang Hong
Meng Wang
203
18
0
19 Jul 2023
A Survey on Open-Vocabulary Detection and Segmentation: Past, Present,
  and Future
A Survey on Open-Vocabulary Detection and Segmentation: Past, Present, and FutureIEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2023
Chaoyang Zhu
Long Chen
ObjDVLM
507
67
0
18 Jul 2023
Human Action Recognition in Still Images Using ConViT
Human Action Recognition in Still Images Using ConViT
Seyed Rohollah Hosseyni
Sanaz Seyedin
Hasan Taheri
ViT
179
2
0
18 Jul 2023
GenAssist: Making Image Generation Accessible
GenAssist: Making Image Generation AccessibleACM Symposium on User Interface Software and Technology (UIST), 2023
Mina Huh
Yi-Hao Peng
Amy Pavel
DiffM
212
54
0
14 Jul 2023
AIC-AB NET: A Neural Network for Image Captioning with Spatial Attention
  and Text Attributes
AIC-AB NET: A Neural Network for Image Captioning with Spatial Attention and Text Attributes
Guoyun Tu
Ying Liu
Vladimir Vlassov
258
1
0
14 Jul 2023
Bootstrapping Vision-Language Learning with Decoupled Language
  Pre-training
Bootstrapping Vision-Language Learning with Decoupled Language Pre-trainingNeural Information Processing Systems (NeurIPS), 2023
Yiren Jian
Chongyang Gao
Soroush Vosoughi
VLMMLLM
380
44
0
13 Jul 2023
Is Task-Agnostic Explainable AI a Myth?
Is Task-Agnostic Explainable AI a Myth?
Alicja Chaszczewicz
221
2
0
13 Jul 2023
Reading Radiology Imaging Like The Radiologist
Reading Radiology Imaging Like The Radiologist
Yuhao Wang
MedIm
237
0
0
12 Jul 2023
DyCL: Dynamic Neural Network Compilation Via Program Rewriting and Graph
  Optimization
DyCL: Dynamic Neural Network Compilation Via Program Rewriting and Graph OptimizationInternational Symposium on Software Testing and Analysis (ISSTA), 2023
Simin Chen
Shiyi Wei
Cong Liu
Wei Yang
176
11
0
11 Jul 2023
Undecimated Wavelet Transform for Word Embedded Semantic Marginal
  Autoencoder in Security improvement and Denoising different Languages
Undecimated Wavelet Transform for Word Embedded Semantic Marginal Autoencoder in Security improvement and Denoising different Languages
S. Shreyanth
47
0
0
06 Jul 2023
Multimodal Prompt Learning for Product Title Generation with Extremely
  Limited Labels
Multimodal Prompt Learning for Product Title Generation with Extremely Limited LabelsAnnual Meeting of the Association for Computational Linguistics (ACL), 2023
Bang-ju Yang
Fenglin Liu
Zheng Li
Qingyu Yin
Chenyu You
Bing Yin
Yuexian Zou
VLM
208
6
0
05 Jul 2023
Seeing in Words: Learning to Classify through Language Bottlenecks
Seeing in Words: Learning to Classify through Language Bottlenecks
Khalid Saifullah
Yuxin Wen
Jonas Geiping
Micah Goldblum
Tom Goldstein
VLM
133
2
0
29 Jun 2023
Variational latent discrete representation for time series modelling
Variational latent discrete representation for time series modelling
Max H. Cohen
M. Charbit
Sylvain Le Corff
275
1
0
27 Jun 2023
Self-Supervised Image Captioning with CLIP
Self-Supervised Image Captioning with CLIP
Chuanyang Jin
VLMSSL
209
3
0
26 Jun 2023
Improving Reference-based Distinctive Image Captioning with Contrastive
  Rewards
Improving Reference-based Distinctive Image Captioning with Contrastive Rewards
Yangjun Mao
Jun Xiao
Dong Zhang
Meng Cao
Jian Shao
Yueting Zhuang
Long Chen
EGVM
200
10
0
25 Jun 2023
Learning Descriptive Image Captioning via Semipermeable Maximum
  Likelihood Estimation
Learning Descriptive Image Captioning via Semipermeable Maximum Likelihood EstimationNeural Information Processing Systems (NeurIPS), 2023
Zihao Yue
Anwen Hu
Liang Zhang
Qin Jin
339
7
0
23 Jun 2023
Dense Video Object Captioning from Disjoint Supervision
Dense Video Object Captioning from Disjoint SupervisionInternational Conference on Learning Representations (ICLR), 2023
Xingyi Zhou
Anurag Arnab
Chen Sun
Cordelia Schmid
286
7
0
20 Jun 2023
KiUT: Knowledge-injected U-Transformer for Radiology Report Generation
KiUT: Knowledge-injected U-Transformer for Radiology Report GenerationComputer Vision and Pattern Recognition (CVPR), 2023
Zhongzhen Huang
Xiaofan Zhang
Shaoting Zhang
MedIm
254
92
0
20 Jun 2023
GraphGLOW: Universal and Generalizable Structure Learning for Graph
  Neural Networks
GraphGLOW: Universal and Generalizable Structure Learning for Graph Neural NetworksKnowledge Discovery and Data Mining (KDD), 2023
Wentao Zhao
Qitian Wu
Chenxiao Yang
Junchi Yan
184
19
0
20 Jun 2023
Multi-Label Meta Weighting for Long-Tailed Dynamic Scene Graph
  Generation
Multi-Label Meta Weighting for Long-Tailed Dynamic Scene Graph GenerationInternational Conference on Multimedia Retrieval (ICMR), 2023
Shuo Chen
Yingjun Du
Pascal Mettes
Cees G. M. Snoek
OffRL
298
6
0
16 Jun 2023
Towards AGI in Computer Vision: Lessons Learned from GPT and Large
  Language Models
Towards AGI in Computer Vision: Lessons Learned from GPT and Large Language Models
Lingxi Xie
Longhui Wei
Xiaopeng Zhang
Kaifeng Bi
Xiaotao Gu
Jianlong Chang
Qi Tian
250
9
0
14 Jun 2023
Top-Down Framework for Weakly-supervised Grounded Image Captioning
Top-Down Framework for Weakly-supervised Grounded Image Captioning
Chen Cai
Suchen Wang
Kim-Hui Yap
Yi Wang
ObjD
226
5
0
13 Jun 2023
Multimodal Explainable Artificial Intelligence: A Comprehensive Review
  of Methodological Advances and Future Research Directions
Multimodal Explainable Artificial Intelligence: A Comprehensive Review of Methodological Advances and Future Research DirectionsIEEE Access (IEEE Access), 2023
N. Rodis
Christos Sardianos
Panagiotis I. Radoglou-Grammatikis
Panagiotis G. Sarigiannidis
Iraklis Varlamis
Georgios Th. Papadopoulos
333
38
0
09 Jun 2023
Customizing General-Purpose Foundation Models for Medical Report
  Generation
Customizing General-Purpose Foundation Models for Medical Report Generation
Bang-ju Yang
Asif Raza
Yuexian Zou
Tong Zhang
MedIm
173
14
0
09 Jun 2023
Object Detection with Transformers: A Review
Object Detection with Transformers: A ReviewItalian National Conference on Sensors (INS), 2023
Tahira Shehzadi
K. Hashmi
D. Stricker
Muhammad Zeshan Afzal
ViTMU
413
53
0
07 Jun 2023
Towards Adaptable and Interactive Image Captioning with Data
  Augmentation and Episodic Memory
Towards Adaptable and Interactive Image Captioning with Data Augmentation and Episodic Memory
Aliki Anagnostopoulou
Mareike Hartmann
Daniel Sonntag
CLLVLM
187
1
0
06 Jun 2023
Putting Humans in the Image Captioning Loop
Putting Humans in the Image Captioning Loop
Aliki Anagnostopoulou
Mareike Hartmann
Daniel Sonntag
VLM
131
3
0
06 Jun 2023
On the Role of Attention in Prompt-tuning
On the Role of Attention in Prompt-tuningInternational Conference on Machine Learning (ICML), 2023
Samet Oymak
A. S. Rawat
Mahdi Soltanolkotabi
Christos Thrampoulidis
MLTLRM
206
59
0
06 Jun 2023
MoviePuzzle: Visual Narrative Reasoning through Multimodal Order Learning
MoviePuzzle: Visual Narrative Reasoning through Multimodal Order Learning
Jianghui Wang
Yuxuan Wang
Dongyan Zhao
Zilong Zheng
342
1
0
04 Jun 2023
Previous
123...8910...707172
Next