ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1502.03044
  4. Cited By
Show, Attend and Tell: Neural Image Caption Generation with Visual
  Attention
v1v2v3 (latest)

Show, Attend and Tell: Neural Image Caption Generation with Visual Attention

10 February 2015
Ke Xu
Jimmy Ba
Ryan Kiros
Dong Wang
Aaron Courville
Ruslan Salakhutdinov
R. Zemel
Yoshua Bengio
    DiffM
ArXiv (abs)PDFHTML

Papers citing "Show, Attend and Tell: Neural Image Caption Generation with Visual Attention"

50 / 3,580 papers shown
Semi-Supervised Panoptic Narrative Grounding
Semi-Supervised Panoptic Narrative GroundingACM Multimedia (ACM MM), 2023
Danni Yang
Jiayi Ji
Xiaoshuai Sun
Haowei Wang
Yinan Li
Yiwei Ma
Rongrong Ji
212
5
0
27 Oct 2023
Style-Aware Radiology Report Generation with RadGraph and Few-Shot
  Prompting
Style-Aware Radiology Report Generation with RadGraph and Few-Shot PromptingConference on Empirical Methods in Natural Language Processing (EMNLP), 2023
Benjamin Yan
Ruochen Liu
David E. Kuo
Subathra Adithan
Eduardo Pontes Reis
...
V. Venugopal
Chloe P. O'Connell
Agustina Saenz
Pranav Rajpurkar
Michael Moor
MedIm
263
37
0
26 Oct 2023
Cross-modal Active Complementary Learning with Self-refining
  Correspondence
Cross-modal Active Complementary Learning with Self-refining CorrespondenceNeural Information Processing Systems (NeurIPS), 2023
Yang Qin
Yuan Sun
Dezhong Peng
Qiufeng Wang
Xiaocui Peng
Peng Hu
283
32
0
26 Oct 2023
FloCoDe: Unbiased Dynamic Scene Graph Generation with Temporal
  Consistency and Correlation Debiasing
FloCoDe: Unbiased Dynamic Scene Graph Generation with Temporal Consistency and Correlation Debiasing
Anant Khandelwal
451
2
0
24 Oct 2023
CPSeg: Finer-grained Image Semantic Segmentation via Chain-of-Thought
  Language Prompting
CPSeg: Finer-grained Image Semantic Segmentation via Chain-of-Thought Language PromptingIEEE Workshop/Winter Conference on Applications of Computer Vision (WACV), 2023
Lei Li
246
35
0
24 Oct 2023
PrivImage: Differentially Private Synthetic Image Generation using
  Diffusion Models with Semantic-Aware Pretraining
PrivImage: Differentially Private Synthetic Image Generation using Diffusion Models with Semantic-Aware Pretraining
Kecen Li
Chen Gong
Zhixiang Li
Yuzhong Zhao
Xinwen Hou
Tianhao Wang
352
20
0
19 Oct 2023
Getting aligned on representational alignment
Getting aligned on representational alignment
Ilia Sucholutsky
Lukas Muttenthaler
Adrian Weller
Andi Peng
Andreea Bobu
...
Thomas Unterthiner
Andrew Kyle Lampinen
Klaus-Robert Muller
M. Toneva
Thomas Griffiths
320
134
0
18 Oct 2023
Bongard-OpenWorld: Few-Shot Reasoning for Free-form Visual Concepts in
  the Real World
Bongard-OpenWorld: Few-Shot Reasoning for Free-form Visual Concepts in the Real WorldInternational Conference on Learning Representations (ICLR), 2023
Rujie Wu
Xiaojian Ma
Zhenliang Zhang
Wei Wang
Qing Li
Song-Chun Zhu
Yizhou Wang
LRMVLM
339
16
0
16 Oct 2023
Few-shot Action Recognition with Captioning Foundation Models
Few-shot Action Recognition with Captioning Foundation Models
Xiang Wang
Shiwei Zhang
Hangjie Yuan
Yingya Zhang
Changxin Gao
Deli Zhao
Nong Sang
VLM
322
9
0
16 Oct 2023
Visual Question Generation in Bengali
Visual Question Generation in Bengali
Mahmud Hasan
Labiba Islam
J. Ruma
T. Mayeesha
Rashedur Rahman
229
1
0
12 Oct 2023
CLIP for Lightweight Semantic Segmentation
CLIP for Lightweight Semantic SegmentationChinese Conference on Pattern Recognition and Computer Vision (CPRCV), 2023
Ke Jin
Wankou Yang
VLM
171
2
0
11 Oct 2023
A Comparative Study of Pre-trained CNNs and GRU-Based Attention for
  Image Caption Generation
A Comparative Study of Pre-trained CNNs and GRU-Based Attention for Image Caption GenerationInternational Conference Robotics and Computer Vision (ICRCV), 2023
Rashid Khan
Bingding Huang
Haseeb Hassan
Asim Zaman
Z. Ye
156
3
0
11 Oct 2023
A Lightweight Video Anomaly Detection Model with Weak Supervision and
  Adaptive Instance Selection
A Lightweight Video Anomaly Detection Model with Weak Supervision and Adaptive Instance Selection
Yang Wang
Jiaogen Zhou
Jihong Guan
304
12
0
09 Oct 2023
Driving with LLMs: Fusing Object-Level Vector Modality for Explainable
  Autonomous Driving
Driving with LLMs: Fusing Object-Level Vector Modality for Explainable Autonomous DrivingIEEE International Conference on Robotics and Automation (ICRA), 2023
Long Chen
Oleg Sinavski
Jan Hünermann
Alice Karnsund
Andrew James Willmott
Danny Birch
Daniel Maund
Jamie Shotton
MLLM
380
289
0
03 Oct 2023
Constructing Image-Text Pair Dataset from Books
Constructing Image-Text Pair Dataset from Books
Yamato Okamoto
Haruto Toyonaga
Yoshihisa Ijiri
Hirokatsu Kataoka
168
4
0
03 Oct 2023
Application of frozen large-scale models to multimodal task-oriented
  dialogue
Application of frozen large-scale models to multimodal task-oriented dialogue
Tatsuki Kawamoto
Takuma Suzuki
Ko Miyama
Takumi Meguro
Tomohiro Takagi
131
2
0
02 Oct 2023
YOLOR-Based Multi-Task Learning
YOLOR-Based Multi-Task Learning
Hung-Shuo Chang
Chien-Yao Wang
Hang Yan
Yukun Zhu
Hongpeng Liao
MoEVLM
193
22
0
29 Sep 2023
PROSE: Predicting Operators and Symbolic Expressions using Multimodal
  Transformers
PROSE: Predicting Operators and Symbolic Expressions using Multimodal Transformers
Yuxuan Liu
Zecheng Zhang
Hayden Schaeffer
208
22
0
28 Sep 2023
XVO: Generalized Visual Odometry via Cross-Modal Self-Training
XVO: Generalized Visual Odometry via Cross-Modal Self-TrainingIEEE International Conference on Computer Vision (ICCV), 2023
Tohida Rehman
Ronit Mandal
Jimuyang Zhang
Debarshi Kumar Sanyal
SSL
359
25
0
28 Sep 2023
Social Media Fashion Knowledge Extraction as Captioning
Social Media Fashion Knowledge Extraction as Captioning
Yifei Yuan
Wenxuan Zhang
Yang Deng
Wai Lam
176
2
0
28 Sep 2023
Attention Sorting Combats Recency Bias In Long Context Language Models
Attention Sorting Combats Recency Bias In Long Context Language Models
A. Peysakhovich
Adam Lerer
LRMRALM
310
81
0
28 Sep 2023
AnyMAL: An Efficient and Scalable Any-Modality Augmented Language Model
AnyMAL: An Efficient and Scalable Any-Modality Augmented Language ModelConference on Empirical Methods in Natural Language Processing (EMNLP), 2023
Avamarie Brueggeman
Andrea Madotto
Mohammad Kachuee
Tushar Nagarajan
Matt Smith
...
Peyman Heidari
Yue Liu
Kavya Srinet
Babak Damavandi
Anuj Kumar
MLLM
286
110
0
27 Sep 2023
CauDR: A Causality-inspired Domain Generalization Framework for
  Fundus-based Diabetic Retinopathy Grading
CauDR: A Causality-inspired Domain Generalization Framework for Fundus-based Diabetic Retinopathy Grading
Hao Wei
Peilun Shi
Juzheng Miao
Minqing Zhang
Guitao Bai
Jianing Qiu
Furui Liu
Wu Yuan
MedImOOD
163
7
0
27 Sep 2023
FaceGemma: Enhancing Image Captioning with Facial Attributes for
  Portrait Images
FaceGemma: Enhancing Image Captioning with Facial Attributes for Portrait Images
Naimul Haque
Iffat Labiba
Sadia Akter
3DHCVBM
201
2
0
24 Sep 2023
A Survey on Image-text Multimodal Models
A Survey on Image-text Multimodal Models
Ruifeng Guo
Jingxuan Wei
Linzhuang Sun
Khai-Nguyen Nguyen
Guiyong Chang
Dawei Liu
Sibo Zhang
Zhengbing Yao
Mingjun Xu
Liping Bu
VLM
316
22
0
23 Sep 2023
An Empirical Study of Attention Networks for Semantic Segmentation
An Empirical Study of Attention Networks for Semantic Segmentation
Hao Guo
Hongbiao Si
Guilin Jiang
Wei Zhang
Zhiyan Liu
Xuanyi Zhu
Xulong Zhang
Yang Liu
216
1
0
19 Sep 2023
R2GenGPT: Radiology Report Generation with Frozen LLMs
R2GenGPT: Radiology Report Generation with Frozen LLMs
Zhanyu Wang
Lingqiao Liu
Lei Wang
Luping Zhou
MedImLM&MAVLM
220
139
0
18 Sep 2023
A Novel Method of Fuzzy Topic Modeling based on Transformer Processing
A Novel Method of Fuzzy Topic Modeling based on Transformer Processing
Ching-Hsun Tseng
Shin-Jye Lee
Po-Wei Cheng
Chien Lee
Chih-Chieh Hung
112
0
0
18 Sep 2023
Holistic Geometric Feature Learning for Structured Reconstruction
Holistic Geometric Feature Learning for Structured ReconstructionIEEE International Conference on Computer Vision (ICCV), 2023
Ziqiong Lu
Linxi Huan
Qiyuan Ma
Xianwei Zheng
183
2
0
18 Sep 2023
Viewpoint Integration and Registration with Vision Language Foundation
  Model for Image Change Understanding
Viewpoint Integration and Registration with Vision Language Foundation Model for Image Change Understanding
Xiaonan Lu
Jianlong Yuan
Ruigang Niu
Yuan Hu
Fan Wang
144
3
0
15 Sep 2023
Towards Practical and Efficient Image-to-Speech Captioning with
  Vision-Language Pre-training and Multi-modal Tokens
Towards Practical and Efficient Image-to-Speech Captioning with Vision-Language Pre-training and Multi-modal TokensIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2023
Minsu Kim
J. Choi
Soumi Maiti
Jeong Hun Yeo
Shinji Watanabe
Y. Ro
VLM
185
8
0
15 Sep 2023
PatFig: Generating Short and Long Captions for Patent Figures
PatFig: Generating Short and Long Captions for Patent Figures
Dana Aubakirova
Kim Gerdes
Lufei Liu
201
15
0
15 Sep 2023
A Fast Optimization View: Reformulating Single Layer Attention in LLM
  Based on Tensor and SVM Trick, and Solving It in Matrix Multiplication Time
A Fast Optimization View: Reformulating Single Layer Attention in LLM Based on Tensor and SVM Trick, and Solving It in Matrix Multiplication Time
Yeqi Gao
Zhao Song
Weixin Wang
Junze Yin
294
30
0
14 Sep 2023
Rank2Tell: A Multimodal Driving Dataset for Joint Importance Ranking and
  Reasoning
Rank2Tell: A Multimodal Driving Dataset for Joint Importance Ranking and ReasoningIEEE Workshop/Winter Conference on Applications of Computer Vision (WACV), 2023
Enna Sachdeva
Nakul Agarwal
Suhas Chundi
Sean Roelofs
Jiachen Li
Mykel Kochenderfer
Chiho Choi
Behzad Dariush
254
73
0
12 Sep 2023
SparseSwin: Swin Transformer with Sparse Transformer Block
SparseSwin: Swin Transformer with Sparse Transformer Block
Krisna Pinasthika
Blessius Sheldo Putra Laksono
Riyandi Banovbi Putera Irsal
Syifa’ Hukma Shabiyya
N. Yudistira
ViT
238
32
0
11 Sep 2023
C-CLIP: Contrastive Image-Text Encoders to Close the
  Descriptive-Commentative Gap
C-CLIP: Contrastive Image-Text Encoders to Close the Descriptive-Commentative GapIEEE Workshop/Winter Conference on Applications of Computer Vision (WACV), 2023
William Theisen
Walter J. Scheirer
CLIPVLM
197
2
0
06 Sep 2023
A Joint Study of Phrase Grounding and Task Performance in Vision and
  Language Models
A Joint Study of Phrase Grounding and Task Performance in Vision and Language Models
Noriyuki Kojima
Hadar Averbuch-Elor
Yoav Artzi
317
2
0
06 Sep 2023
Exchanging-based Multimodal Fusion with Transformer
Exchanging-based Multimodal Fusion with Transformer
Renyu Zhu
Chengcheng Han
Yong Qian
Qiushi Sun
Xiang Li
Ming Gao
Xuezhi Cao
Yunsen Xian
174
5
0
05 Sep 2023
Distraction-free Embeddings for Robust VQA
Distraction-free Embeddings for Robust VQA
Atharvan Dogra
Deeksha Varshney
Ashwin Kalyan
Ameet Deshpande
Neeraj Kumar
194
0
0
31 Aug 2023
FIRE: Food Image to REcipe generation
FIRE: Food Image to REcipe generationIEEE Workshop/Winter Conference on Applications of Computer Vision (WACV), 2023
P. Chhikara
Dhiraj Chaurasia
Yifan Jiang
Omkar Masur
Filip Ilievski
265
35
0
28 Aug 2023
Goodhart's Law Applies to NLP's Explanation Benchmarks
Goodhart's Law Applies to NLP's Explanation BenchmarksFindings (Findings), 2023
Jennifer Hsia
Danish Pruthi
Aarti Singh
Zachary Chase Lipton
215
7
0
28 Aug 2023
MultiCapCLIP: Auto-Encoding Prompts for Zero-Shot Multilingual Visual
  Captioning
MultiCapCLIP: Auto-Encoding Prompts for Zero-Shot Multilingual Visual CaptioningAnnual Meeting of the Association for Computational Linguistics (ACL), 2023
Bang-ju Yang
Fenglin Liu
X. Wu
Yaowei Wang
Xu Sun
Yuexian Zou
VLMCLIP
212
20
0
25 Aug 2023
PromptMRG: Diagnosis-Driven Prompts for Medical Report Generation
PromptMRG: Diagnosis-Driven Prompts for Medical Report GenerationAAAI Conference on Artificial Intelligence (AAAI), 2023
Haibo Jin
Haoxuan Che
Yi Lin
Haoxing Chen
MedIm
283
123
0
24 Aug 2023
With a Little Help from your own Past: Prototypical Memory Networks for
  Image Captioning
With a Little Help from your own Past: Prototypical Memory Networks for Image CaptioningIEEE International Conference on Computer Vision (ICCV), 2023
Manuele Barraco
Sara Sarto
Marcella Cornia
Lorenzo Baraldi
Rita Cucchiara
VLM
180
30
0
23 Aug 2023
CgT-GAN: CLIP-guided Text GAN for Image Captioning
CgT-GAN: CLIP-guided Text GAN for Image CaptioningACM Multimedia (ACM MM), 2023
Jiarui Yu
Haoran Li
Y. Hao
B. Zhu
Tong Xu
Xiangnan He
VLMCLIP
217
24
0
23 Aug 2023
ROSGPT_Vision: Commanding Robots Using Only Language Models' Prompts
ROSGPT_Vision: Commanding Robots Using Only Language Models' PromptsFuture generations computer systems (FGCS), 2023
Bilel Benjdira
Anis Koubaa
Anas M. Ali
LM&Ro
183
10
0
22 Aug 2023
Explore and Tell: Embodied Visual Captioning in 3D Environments
Explore and Tell: Embodied Visual Captioning in 3D EnvironmentsIEEE International Conference on Computer Vision (ICCV), 2023
Anwen Hu
Shizhe Chen
Liang Zhang
Qin Jin
LM&Ro
189
3
0
21 Aug 2023
Lip Reading for Low-resource Languages by Learning and Combining General
  Speech Knowledge and Language-specific Knowledge
Lip Reading for Low-resource Languages by Learning and Combining General Speech Knowledge and Language-specific KnowledgeIEEE International Conference on Computer Vision (ICCV), 2023
Minsu Kim
Jeong Hun Yeo
J. Choi
Y. Ro
208
27
0
18 Aug 2023
Learning the meanings of function words from grounded language using a
  visual question answering model
Learning the meanings of function words from grounded language using a visual question answering modelCognitive Sciences (CogSci), 2023
Eva Portelance
Michael C. Frank
Dan Jurafsky
NAI
271
7
0
16 Aug 2023
Visually-Aware Context Modeling for News Image Captioning
Visually-Aware Context Modeling for News Image CaptioningNorth American Chapter of the Association for Computational Linguistics (NAACL), 2023
Tingyu Qu
Tinne Tuytelaars
Marie-Francine Moens
VLM
130
17
0
16 Aug 2023
Previous
123...789...707172
Next