ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1502.03044
  4. Cited By
Show, Attend and Tell: Neural Image Caption Generation with Visual
  Attention
v1v2v3 (latest)

Show, Attend and Tell: Neural Image Caption Generation with Visual Attention

10 February 2015
Ke Xu
Jimmy Ba
Ryan Kiros
Dong Wang
Aaron Courville
Ruslan Salakhutdinov
R. Zemel
Yoshua Bengio
    DiffM
ArXiv (abs)PDFHTML

Papers citing "Show, Attend and Tell: Neural Image Caption Generation with Visual Attention"

50 / 3,580 papers shown
Dual-level Modality Debiasing Learning for Unsupervised Visible-Infrared Person Re-Identification
Dual-level Modality Debiasing Learning for Unsupervised Visible-Infrared Person Re-Identification
J. Li
Yan Lu
Bin Liu
Guojun Yin
Mang Ye
72
0
0
03 Dec 2025
Seeing What Matters: Visual Preference Policy Optimization for Visual Generation
Seeing What Matters: Visual Preference Policy Optimization for Visual Generation
Ziqi Ni
Yuanzhi Liang
Rui Li
Yi Zhou
H. Huang
Chi Zhang
Xuelong Li
107
0
0
24 Nov 2025
Pharmacophore-based design by learning on voxel grids
Pharmacophore-based design by learning on voxel grids
Omar Mahmood
Pedro H. O. Pinheiro
Richard Bonneau
Saeed Saremi
Vishnu Sresht
64
0
0
19 Nov 2025
Medical Report Generation: A Hierarchical Task Structure-Based Cross-Modal Causal Intervention Framework
Medical Report Generation: A Hierarchical Task Structure-Based Cross-Modal Causal Intervention Framework
Yucheng Song
Yifan Ge
Junhao Li
Zhining Liao
Zhifang Liao
81
0
0
04 Nov 2025
SilhouetteTell: Practical Video Identification Leveraging Blurred Recordings of Video Subtitles
SilhouetteTell: Practical Video Identification Leveraging Blurred Recordings of Video Subtitles
Guanchong Huang
Song Fang
103
0
0
31 Oct 2025
FLoC: Facility Location-Based Efficient Visual Token Compression for Long Video Understanding
FLoC: Facility Location-Based Efficient Visual Token Compression for Long Video Understanding
Janghoon Cho
Jungsoo Lee
Munawar Hayat
Kyuwoong Hwang
Fatih Porikli
Sungha Choi
84
0
0
31 Oct 2025
Generating Accurate and Detailed Captions for High-Resolution Images
Generating Accurate and Detailed Captions for High-Resolution Images
Hankyeol Lee
Gawon Seo
Kyounggyu Lee
Dogun Kim
Kyungwoo Song
Jiyoung Jung
MLLMVLM
217
0
0
31 Oct 2025
Transformers in Medicine: Improving Vision-Language Alignment for Medical Image Captioning
Transformers in Medicine: Improving Vision-Language Alignment for Medical Image Captioning
Yogesh Thakku Suresh
Vishwajeet Shivaji Hogale
Luca-Alexandru Zamfira
Anandavardhana Hegde
MedImLM&MA
430
0
0
29 Oct 2025
MedXplain-VQA: Multi-Component Explainable Medical Visual Question Answering
MedXplain-VQA: Multi-Component Explainable Medical Visual Question Answering
Hai-Dang Nguyen
Minh-Anh Dang
Minh-Tan Le
Minh-Tuan Le
95
1
0
26 Oct 2025
StarBench: A Turn-Based RPG Benchmark for Agentic Multimodal Decision-Making and Information Seeking
StarBench: A Turn-Based RPG Benchmark for Agentic Multimodal Decision-Making and Information Seeking
Haoran Zhang
C. Zhu
Sicong Guo
Hanzhe Guo
Haiming Li
Donglin Yu
126
0
0
21 Oct 2025
MatchAttention: Matching the Relative Positions for High-Resolution Cross-View Matching
MatchAttention: Matching the Relative Positions for High-Resolution Cross-View Matching
Tingman Yan
Tao Liu
Xilian Yang
Qunfei Zhao
Zeyang Xia
3DV
215
0
0
16 Oct 2025
MetaCaptioner: Towards Generalist Visual Captioning with Open-source Suites
MetaCaptioner: Towards Generalist Visual Captioning with Open-source Suites
Zhenxin Lei
Zhangwei Gao
Changyao Tian
Erfei Cui
Guanzhou Chen
...
Xiangyu Zhao
Jiayi Ji
Yu Qiao
Wenhai Wang
Gen Luo
VLM
245
0
0
14 Oct 2025
Convolutional Attention in Betting Exchange Markets
Convolutional Attention in Betting Exchange Markets
Rui Gonçalves
Vitor Miguel Ribeiro
Roman Chertovskih
António Pedro Aguiar
44
0
0
14 Oct 2025
Image-to-Video Transfer Learning based on Image-Language Foundation Models: A Comprehensive Survey
Image-to-Video Transfer Learning based on Image-Language Foundation Models: A Comprehensive Survey
Jinxuan Li
Chaolei Tan
Haoxuan Chen
Jianxin Ma
Jian-Fang Hu
Wei-Shi Zheng
Jianhuang Lai
VLM
141
1
0
12 Oct 2025
AI-Driven Radiology Report Generation for Traumatic Brain Injuries
AI-Driven Radiology Report Generation for Traumatic Brain Injuries
Riadh Bouslimi
Houda Trabelsi
Wahiba Ben Abdssalem Karaa
Hana Hedhli
MedIm
112
4
0
09 Oct 2025
Uncertainty in Machine Learning
Uncertainty in Machine Learning
Hans Weytjens
Wouter Verbeke
UD
245
0
0
07 Oct 2025
The Transformer Cookbook
The Transformer Cookbook
Andy Yang
Christopher Watson
Anton Xue
S. Bhattamishra
Jose Llarena
William Merrill
Emile Dos Santos Ferreira
Anej Svete
David Chiang
141
0
0
01 Oct 2025
MCM-DPO: Multifaceted Cross-Modal Direct Preference Optimization for Alt-text Generation
MCM-DPO: Multifaceted Cross-Modal Direct Preference Optimization for Alt-text Generation
Jinlan Fu
Shenzhen Huangfu
Hao Fei
Yichong Huang
Xiaoyu Shen
Xipeng Qiu
See-Kiong Ng
93
0
0
01 Oct 2025
FinCap: Topic-Aligned Captions for Short-Form Financial YouTube Videos
FinCap: Topic-Aligned Captions for Short-Form Financial YouTube Videos
Siddhant Sukhani
Yash Bhardwaj
Riya Bhadani
Veer Kejriwal
Michael Galarnyk
Sudheer Chava
80
0
0
30 Sep 2025
Understanding Cognitive States from Head & Hand Motion Data
Understanding Cognitive States from Head & Hand Motion Data
Kaiang Wen
Mark Roman Miller
84
0
0
29 Sep 2025
Diff-3DCap: Shape Captioning with Diffusion Models
Diff-3DCap: Shape Captioning with Diffusion ModelsIEEE Transactions on Visualization and Computer Graphics (TVCG), 2025
Zhenyu Shu
Jiawei Wen
Shiyang Li
Shiqing Xin
Ligang Liu
DiffM
123
0
0
28 Sep 2025
Universal Multi-Domain Translation via Diffusion Routers
Universal Multi-Domain Translation via Diffusion Routers
Duc Kieu
Kien Do
Tuan Hoang
T. Le
Tung Kieu
D. Nguyen
T. Nguyen
116
0
0
26 Sep 2025
Unlocking Financial Insights: An advanced Multimodal Summarization with Multimodal Output Framework for Financial Advisory Videos
Unlocking Financial Insights: An advanced Multimodal Summarization with Multimodal Output Framework for Financial Advisory Videos
Sarmistha Das
R E Zera Lyngkhoi
Sriparna Saha
Alka Maurya
VGen
116
0
0
25 Sep 2025
An overview of neural architectures for self-supervised audio representation learning from masked spectrograms
An overview of neural architectures for self-supervised audio representation learning from masked spectrograms
Sarthak Yadav
Sergios Theodoridis
Zheng-Hua Tan
Mamba
187
0
0
23 Sep 2025
Pre-Trained CNN Architecture for Transformer-Based Image Caption Generation Model
Pre-Trained CNN Architecture for Transformer-Based Image Caption Generation Model
Amanuel Tafese Dufera
ViTVLM
132
0
0
22 Sep 2025
DeepEyeNet: Generating Medical Report for Retinal Images
DeepEyeNet: Generating Medical Report for Retinal Images
Jia-Hong Huang
MedIm
166
1
0
16 Sep 2025
Simulating Sinogram-Domain Motion and Correcting Image-Domain Artifacts Using Deep Learning in HR-pQCT Bone Imaging
Simulating Sinogram-Domain Motion and Correcting Image-Domain Artifacts Using Deep Learning in HR-pQCT Bone ImagingIEEE Transactions on Radiation and Plasma Medical Sciences (TRPMS), 2025
Farhan Sadik
Christopher L. Newman
Stuart J. Warden
Rachel K. Surowiec
MedIm
141
0
0
13 Sep 2025
Zero-shot Hierarchical Plant Segmentation via Foundation Segmentation Models and Text-to-image Attention
Zero-shot Hierarchical Plant Segmentation via Foundation Segmentation Models and Text-to-image Attention
Junhao Xing
Ryohei Miyakawa
Yang Yang
Xinpeng Liu
Risa Shinoda
Hiroaki Santo
Yosuke Toda
Fumio Okura
VLM
173
0
0
11 Sep 2025
Compressing CNN models for resource-constrained systems by channel and layer pruning
Compressing CNN models for resource-constrained systems by channel and layer pruning
Ahmed Sadaqa
Di Liu
156
0
0
10 Sep 2025
Teaching AI Stepwise Diagnostic Reasoning with Report-Guided Chain-of-Thought Learning
Teaching AI Stepwise Diagnostic Reasoning with Report-Guided Chain-of-Thought Learning
Yihong Luo
Wenwu He
Zhuo-Xu Cui
Dong Liang
LM&MALRM
83
0
0
08 Sep 2025
Lesion-Aware Visual-Language Fusion for Automated Image Captioning of Ulcerative Colitis Endoscopic Examinations
Lesion-Aware Visual-Language Fusion for Automated Image Captioning of Ulcerative Colitis Endoscopic Examinations
Alexis Ivan Lopez Escamilla
Gilberto Ochoa
Sharib Al
MedIm
77
0
0
03 Sep 2025
Omnidirectional Spatial Modeling from Correlated Panoramas
Omnidirectional Spatial Modeling from Correlated Panoramas
Xinshen Zhang
Tongxi Fu
Xu Zheng
141
1
0
02 Sep 2025
OpenVision 2: A Family of Generative Pretrained Visual Encoders for Multimodal Learning
OpenVision 2: A Family of Generative Pretrained Visual Encoders for Multimodal Learning
Yanqing Liu
Xianhang Li
Letian Zhang
Zirui Wang
Zeyu Zheng
Yuyin Zhou
Cihang Xie
VLM
197
2
0
01 Sep 2025
Automatic Identification and Description of Jewelry Through Computer Vision and Neural Networks for Translators and Interpreters
Automatic Identification and Description of Jewelry Through Computer Vision and Neural Networks for Translators and InterpretersApplied Sciences (AS), 2025
José M. Alcalde-Llergo
Aurora Ruiz-Mezcua
Rocio Avila-Ramirez
Andrea Zingoni
Juri Taborri
Enrique Yeguas-Bolivar
124
0
0
31 Aug 2025
Event-Enriched Image Analysis Grand Challenge at ACM Multimedia 2025
Event-Enriched Image Analysis Grand Challenge at ACM Multimedia 2025
T. Tran
Minh-Quang Nguyen
Minh-Triet Tran
Tam V. Nguyen
Trong-Le Do
Duy-Nam Ly
Viet-Tham Huynh
Khanh-Duy Le
Mai-Khiem Tran
Trung-Truc Huynh-Le
VGen
104
0
0
26 Aug 2025
From Basic Affordances to Symbolic Thought: A Computational Phylogenesis of Biological Intelligence
From Basic Affordances to Symbolic Thought: A Computational Phylogenesis of Biological Intelligence
John E. Hummel
Rachel Heaton
79
0
0
20 Aug 2025
AGIC: Attention-Guided Image Captioning to Improve Caption Relevance
AGIC: Attention-Guided Image Captioning to Improve Caption Relevance
L. D. M. S. Sai Teja
Ashok Urlana
Pruthwik Mishra
132
0
0
09 Aug 2025
Mask & Match: Learning to Recognize Handwritten Math with Self-Supervised Attention
Mask & Match: Learning to Recognize Handwritten Math with Self-Supervised Attention
Shree Mitra
Ritabrata Chakraborty
Nilkanta Sahu
107
0
0
08 Aug 2025
X-SAM: From Segment Anything to Any Segmentation
X-SAM: From Segment Anything to Any Segmentation
Hao Wang
Limeng Qiao
Zequn Jie
Zhijian Huang
Chengjian Feng
Qingfang Zheng
Lin Ma
X. Lan
Xiaodan Liang
VLM
129
6
0
06 Aug 2025
Excavate the potential of Single-Scale Features: A Decomposition Network for Water-Related Optical Image Enhancement
Excavate the potential of Single-Scale Features: A Decomposition Network for Water-Related Optical Image EnhancementIEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing (IEEE J-STARS), 2025
Zheng Cheng
Wenri Wang
Guangyong Chen
Yakun Ju
Yihua Cheng
Zhisong Liu
Yanda Meng
Jintao Song
96
0
0
06 Aug 2025
AttZoom: Attention Zoom for Better Visual Features
AttZoom: Attention Zoom for Better Visual Features
Daniel DeAlcala
Aythami Morales
Julian Fierrez
Ruben Tolosana
174
1
0
05 Aug 2025
SURE-Med: Systematic Uncertainty Reduction for Enhanced Reliability in Medical Report Generation
SURE-Med: Systematic Uncertainty Reduction for Enhanced Reliability in Medical Report Generation
Yuhang Gu
Xingyu Hu
Yuyu Fan
Xulin Yan
Longhuan Xu
Peng peng
MedIm
112
0
0
03 Aug 2025
Referring Remote Sensing Image Segmentation with Cross-view Semantics Interaction Network
Referring Remote Sensing Image Segmentation with Cross-view Semantics Interaction Network
Jiaxing Yang
Lihe Zhang
Huchuan Lu
148
0
0
02 Aug 2025
From Image Captioning to Visual Storytelling
From Image Captioning to Visual Storytelling
Admitos Passadakis
Yingjin Song
Albert Gatt
DiffM
218
0
0
31 Jul 2025
Modality-Aware Feature Matching: A Comprehensive Review of Single- and Cross-Modality Techniques
Modality-Aware Feature Matching: A Comprehensive Review of Single- and Cross-Modality Techniques
Weide Liu
Wei Zhou
Jun Liu
Ping Hu
Jun Cheng
Jungong Han
Weisi Lin
3DV
211
3
0
30 Jul 2025
When Better Eyes Lead to Blindness: A Diagnostic Study of the Information Bottleneck in CNN-LSTM Image Captioning Models
When Better Eyes Lead to Blindness: A Diagnostic Study of the Information Bottleneck in CNN-LSTM Image Captioning ModelsInternational Journal of Computer Applications (IJCA), 2025
Hitesh Kumar Gupta
VLM
202
0
0
24 Jul 2025
Failure Prediction in Conversational Recommendation Systems
Failure Prediction in Conversational Recommendation SystemsACM Conference on Recommender Systems (RecSys), 2025
Maria Vlachou
109
0
0
23 Jul 2025
OrdShap: Feature Position Importance for Sequential Black-Box Models
OrdShap: Feature Position Importance for Sequential Black-Box Models
Davin Hill
Brian L. Hill
A. Masoomi
Vijay S. Nori
Robert E. Tillman
Jennifer Dy
FAtt
303
0
0
16 Jul 2025
Domain-Adaptive Small Language Models for Structured Tax Code Prediction
Domain-Adaptive Small Language Models for Structured Tax Code Prediction
Souvik Nath
Sumit Wadhwa
Luis Perez
169
0
0
15 Jul 2025
Cross-Modal Dual-Causal Learning for Long-Term Action Recognition
Cross-Modal Dual-Causal Learning for Long-Term Action Recognition
Xu Shaowu
Jia Xibin
Gao Junyu
Sun Qianmei
Chang Jing
Fan Chao
205
0
0
09 Jul 2025
1234...707172
Next