ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1502.03044
  4. Cited By
Show, Attend and Tell: Neural Image Caption Generation with Visual
  Attention
v1v2v3 (latest)

Show, Attend and Tell: Neural Image Caption Generation with Visual Attention

10 February 2015
Ke Xu
Jimmy Ba
Ryan Kiros
Dong Wang
Aaron Courville
Ruslan Salakhutdinov
R. Zemel
Yoshua Bengio
    DiffM
ArXiv (abs)PDFHTML

Papers citing "Show, Attend and Tell: Neural Image Caption Generation with Visual Attention"

50 / 3,578 papers shown
Title
Federated Multimodal Learning with Dual Adapters and Selective Pruning for Communication and Computational EfficiencyIEEE/ACM International Symposium on Cluster, Cloud and Internet Computing (CCGrid), 2025
Duy Phuong Nguyen
J. P. Muñoz
Tanya Roosta
Ali Jannesari
FedML
212
2
0
10 Mar 2025
A Woman with a Knife or A Knife with a Woman? Measuring Directional Bias Amplification in Image Captions
A Woman with a Knife or A Knife with a Woman? Measuring Directional Bias Amplification in Image Captions
Rahul Nair
Bhanu Tokas
Neel Shah
335
0
0
10 Mar 2025
TI-JEPA: An Innovative Energy-based Joint Embedding Strategy for Text-Image Multimodal Systems
Khang H. N. Vo
D. Q. Nguyen
T. Nguyen
Tho Quan
234
5
0
09 Mar 2025
MSConv: Multiplicative and Subtractive Convolution for Face Recognition
Si Zhou
Yain-Whar Si
Xiaochen Yuan
Xiaofan Li
Xiaoxiang Liu
Xinyuan Zhang
Cong Lin
Xueyuan Gong
CVBM
272
0
0
08 Mar 2025
Extracting Symbolic Sequences from Visual Representations via Self-Supervised Learning
Victor Sebastian Martinez Pozos
Ivan Vladimir Meza Ruiz
156
0
0
06 Mar 2025
Cross-modal Causal Relation Alignment for Video Question GroundingComputer Vision and Pattern Recognition (CVPR), 2025
Weixing Chen
Wenshu Fan
Binglin Chen
Jiandong Su
Yongsen Zheng
Guanbin Li
BDLVGenCML
262
7
0
05 Mar 2025
Abn-BLIP: Abnormality-aligned Bootstrapping Language-Image Pre-training for Pulmonary Embolism Diagnosis and Report Generation from CTPA
Abn-BLIP: Abnormality-aligned Bootstrapping Language-Image Pre-training for Pulmonary Embolism Diagnosis and Report Generation from CTPAMedical Image Analysis (MedIA), 2025
Z. Zhong
Yuli Wang
Lulu Bi
Zhuoqi Ma
S. H. Ahn
...
Webster Stayman
Todd M. Kolb
I. Kamel
Harrison X. Bai
Zhicheng Jiao
LM&MA
205
0
0
03 Mar 2025
AC-Lite : A Lightweight Image Captioning Model for Low-Resource Assamese Language
AC-Lite : A Lightweight Image Captioning Model for Low-Resource Assamese Language
Pankaj Choudhury
Yogesh Aggarwal
Prabhanjan Jadhav
Prithwijit Guha
Sukumar Nandi
335
0
0
03 Mar 2025
A Survey of Link Prediction in Temporal Networks
A Survey of Link Prediction in Temporal Networks
Jiafeng Xiong
Ahmad Zareie
Rizos Sakellariou
AI4TSAI4CE
181
6
0
28 Feb 2025
Beyond RNNs: Benchmarking Attention-Based Image Captioning Models
Beyond RNNs: Benchmarking Attention-Based Image Captioning Models
Hemanth Teja Yanambakkam
Rahul Chinthala
91
0
0
26 Feb 2025
Omni-SILA: Towards Omni-scene Driven Visual Sentiment Identifying, Locating and Attributing in VideosThe Web Conference (WWW), 2025
Jiamin Luo
Jingjing Wang
Junxiao Ma
Yujie Jin
Shoushan Li
Guodong Zhou
244
1
0
26 Feb 2025
Grad-ECLIP: Gradient-based Visual and Textual Explanations for CLIP
Grad-ECLIP: Gradient-based Visual and Textual Explanations for CLIP
Chenyang Zhao
Kun Wang
J. H. Hsiao
Antoni B. Chan
CLIP
247
6
0
26 Feb 2025
Good Representation, Better Explanation: Role of Convolutional Neural Networks in Transformer-Based Remote Sensing Image Captioning
Good Representation, Better Explanation: Role of Convolutional Neural Networks in Transformer-Based Remote Sensing Image Captioning
Swadhin Das
Saarthak Gupta
and Kamal Kumar
Raksha Sharma
135
2
0
22 Feb 2025
CAPability: A Comprehensive Visual Caption Benchmark for Evaluating Both Correctness and Thoroughness
CAPability: A Comprehensive Visual Caption Benchmark for Evaluating Both Correctness and Thoroughness
Zhihang Liu
Chen-Wei Xie
Bin Wen
Feiwu Yu
Jixuan Chen
...
Nianzu Yang
Yinglu Li
Zuan Gao
Yun Zheng
Hongtao Xie
VLMCoGe
430
0
0
19 Feb 2025
A Comprehensive Survey on Composed Image Retrieval
A Comprehensive Survey on Composed Image Retrieval
Xuemeng Song
Haoqiang Lin
Haokun Wen
Bohan Hou
Mingzhu Xu
Liqiang Nie
421
7
0
19 Feb 2025
Performance Analysis of Traditional VQA Models Under Limited Computational Resources
Jihao Gu
250
1
0
09 Feb 2025
Using Large Language Models for education managements in Vietnamese with low resourcesPacific Asia Conference on Language, Information and Computation (PACLIC), 2025
Duc Do Minh
Vinh Nguyen Van
Thang Dam Cong
244
2
0
28 Jan 2025
A Study of the Plausibility of Attention between RNN Encoders in Natural Language Inference
A Study of the Plausibility of Attention between RNN Encoders in Natural Language InferenceInternational Conference on Machine Learning and Applications (ICMLA), 2021
Duc Hau Nguyen
Duc Hau Nguyen
Pascale Sébillot
192
6
0
23 Jan 2025
The Quest for Visual Understanding: A Journey Through the Evolution of Visual Question Answering
The Quest for Visual Understanding: A Journey Through the Evolution of Visual Question Answering
Anupam Pandey
Deepjyoti Bodo
Arpan Phukan
Asif Ekbal
379
2
0
13 Jan 2025
H-MBA: Hierarchical MamBa Adaptation for Multi-Modal Video Understanding in Autonomous Driving
H-MBA: Hierarchical MamBa Adaptation for Multi-Modal Video Understanding in Autonomous DrivingAAAI Conference on Artificial Intelligence (AAAI), 2025
Tian Jin
Yuxiao Luo
Yue Ma
Yu Qiao
Yali Wang
Mamba
238
5
0
08 Jan 2025
GIT-CXR: End-to-End Transformer for Chest X-Ray Report Generation
Iustin Sîrbu
Iulia-Renata Sîrbu
Jasmina Bogojeska
Traian Rebedea
MedImViTLM&MA
168
3
0
05 Jan 2025
Classifier-Guided Captioning Across ModalitiesIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2025
Ariel Shaulov
Tal Shaharabany
E. Shaar
Gal Chechik
Lior Wolf
197
0
0
03 Jan 2025
Unleashing Text-to-Image Diffusion Prior for Zero-Shot Image CaptioningEuropean Conference on Computer Vision (ECCV), 2024
Jianjie Luo
Jingwen Chen
Yehao Li
Yingwei Pan
Jianlin Feng
Hongyang Chao
Ting Yao
DiffMVLM
241
1
0
03 Jan 2025
Real-time Bangla Sign Language Translator
Real-time Bangla Sign Language Translator
Rotan Hawlader Pranto
Shahnewaz Siddique
SLR
140
2
0
21 Dec 2024
Reframing Image Difference Captioning with BLIP2IDC and Synthetic Augmentation
Reframing Image Difference Captioning with BLIP2IDC and Synthetic AugmentationIEEE Workshop/Winter Conference on Applications of Computer Vision (WACV), 2024
Gautier Evennou
Antoine Chaffin
Vivien Chappelier
Ewa Kijak
DiffM
236
1
0
20 Dec 2024
Automated Image Captioning with CNNs and Transformers
Automated Image Captioning with CNNs and Transformers
Joshua Adrian Cahyono
Jeremy Nathan Jusuf
VLMViT
119
1
0
13 Dec 2024
Advancing Attribution-Based Neural Network Explainability through
  Relative Absolute Magnitude Layer-Wise Relevance Propagation and
  Multi-Component Evaluation
Advancing Attribution-Based Neural Network Explainability through Relative Absolute Magnitude Layer-Wise Relevance Propagation and Multi-Component EvaluationACM Transactions on Intelligent Systems and Technology (ACM TIST), 2024
Davor Vukadin
Petar Afrić
Marin Šilić
Goran Delač
FAtt
231
2
0
12 Dec 2024
FlashSloth: Lightning Multimodal Large Language Models via Embedded
  Visual Compression
FlashSloth: Lightning Multimodal Large Language Models via Embedded Visual CompressionComputer Vision and Pattern Recognition (CVPR), 2024
Bo Tong
Bokai Lai
Weihao Ye
Gen Luo
Chunjiang Ge
Ke Li
Xiaoshuai Sun
Rongrong Ji
VLMMLLM
185
4
0
05 Dec 2024
Automated Medical Report Generation for ECG Data: Bridging Medical Text
  and Signal Processing with Deep Learning
Automated Medical Report Generation for ECG Data: Bridging Medical Text and Signal Processing with Deep Learning
Amnon Bleich
A. Linnemann
B. Diem
Tim Conrad
MedIm
191
4
0
05 Dec 2024
Who Brings the Frisbee: Probing Hidden Hallucination Factors in Large
  Vision-Language Model via Causality Analysis
Who Brings the Frisbee: Probing Hidden Hallucination Factors in Large Vision-Language Model via Causality AnalysisIEEE Workshop/Winter Conference on Applications of Computer Vision (WACV), 2024
Po-Hsuan Huang
Jeng-Lin Li
Chin-Po Chen
Ming-Ching Chang
Wei-Chao Chen
LRM
257
4
0
04 Dec 2024
Was that Sarcasm?: A Literature Survey on Sarcasm Detection
Was that Sarcasm?: A Literature Survey on Sarcasm Detection
Harleen Kaur Bagga
Jasmine Bernard
Sahil Shaheen
Sarthak Arora
157
1
0
30 Nov 2024
Detailed Object Description with Controllable Dimensions
Detailed Object Description with Controllable DimensionsIEEE transactions on multimedia (IEEE TMM), 2024
Xinran Wang
Hao Zhang
Baoteng Li
Kongming Liang
Hao Sun
Zhongjiang He
Tianhao Shen
Jun Guo
281
1
0
28 Nov 2024
VLM-HOI: Vision Language Models for Interpretable Human-Object
  Interaction Analysis
VLM-HOI: Vision Language Models for Interpretable Human-Object Interaction Analysis
Donggoo Kang
Dasol Jeong
Hyunmin Lee
Sangwoo Park
Hasil Park
Sunkyu Kwon
Yeongjoon Kim
Joonki Paik
MLLMVLM
308
1
0
27 Nov 2024
GeoFormer: A Multi-Polygon Segmentation Transformer
GeoFormer: A Multi-Polygon Segmentation TransformerBritish Machine Vision Conference (BMVC), 2024
Maxim Khomiakov
Michael Riis Andersen
J. Frellsen
183
1
0
25 Nov 2024
Can Reasons Help Improve Pedestrian Intent Estimation? A Cross-Modal
  Approach
Can Reasons Help Improve Pedestrian Intent Estimation? A Cross-Modal ApproachIEEE/RJS International Conference on Intelligent RObots and Systems (IROS), 2024
Vaishnavi Khindkar
V. Balasubramanian
Chetan Arora
A. Subramanian
C. V. Jawahar
251
0
0
20 Nov 2024
CUE-M: Contextual Understanding and Enhanced Search with Multimodal Large Language Model
CUE-M: Contextual Understanding and Enhanced Search with Multimodal Large Language Model
Dongyoung Go
Taesun Whang
Chanhee Lee
Hwayeon Kim
Sunghoon Park
Seunghwan Ji
Dongchan Kim
Young-Bum Kim
Young-Bum Kim
LRM
1.0K
1
0
19 Nov 2024
Anatomy-Guided Radiology Report Generation with Pathology-Aware Regional Prompts
Yijian Gao
D. C. Marshall
Xiaodan Xing
Junzhi Ning
G. Papanastasiou
G. Yang
M. Komorowski
MedIm
156
0
0
16 Nov 2024
SASE: A Searching Architecture for Squeeze and Excitation Operations
SASE: A Searching Architecture for Squeeze and Excitation Operations
Hanming Wang
Yunlong Li
Zijun Wu
Huifen Wang
Yuan Zhang
3DPC
117
1
0
13 Nov 2024
Multi-Modal interpretable automatic video captioning
Multi-Modal interpretable automatic video captioning
Antoine Hanna-Asaad
Decky Aspandi
Titus Zaharia
223
1
0
11 Nov 2024
Extended multi-stream temporal-attention module for skeleton-based human
  action recognition (HAR)
Extended multi-stream temporal-attention module for skeleton-based human action recognition (HAR)Computers in Human Behavior (CHB), 2024
Faisal Mehmood
Xin Guo
Enqing Chen
Muhammad Azeem Akbar
A. Khan
Sami Ullah
246
8
0
10 Nov 2024
Generalization and Risk Bounds for Recurrent Neural Networks
Generalization and Risk Bounds for Recurrent Neural Networks
Xuewei Cheng
Ke Huang
Shujie Ma
295
1
0
05 Nov 2024
FactorizePhys: Matrix Factorization for Multidimensional Attention in
  Remote Physiological Sensing
FactorizePhys: Matrix Factorization for Multidimensional Attention in Remote Physiological SensingNeural Information Processing Systems (NeurIPS), 2024
Jitesh Joshi
Sos S. Agaian
Youngjun Cho
AI4TS
270
6
0
03 Nov 2024
Semi-supervised Chinese Poem-to-Painting Generation via Cycle-consistent
  Adversarial Networks
Semi-supervised Chinese Poem-to-Painting Generation via Cycle-consistent Adversarial Networks
Zhengyang Lu
Tianhao Guo
Feng Wang
GAN
145
7
0
25 Oct 2024
Anomaly Resilient Temporal QoS Prediction using Hypergraph Convoluted Transformer Network
Anomaly Resilient Temporal QoS Prediction using Hypergraph Convoluted Transformer Network
Suraj Kumar
S. Chattopadhyay
Chandranath Adak
139
0
0
23 Oct 2024
PromptExp: Multi-granularity Prompt Explanation of Large Language Models
PromptExp: Multi-granularity Prompt Explanation of Large Language Models
Ximing Dong
Shaowei Wang
Dayi Lin
Gopi Krishnan Rajbahadur
Boquan Zhou
Shichao Liu
Ahmed E. Hassan
AAMLLRM
335
4
0
16 Oct 2024
HASN: Hybrid Attention Separable Network for Efficient Image
  Super-resolution
HASN: Hybrid Attention Separable Network for Efficient Image Super-resolutionThe Visual Computer (VC), 2024
Weifeng Cao
Xiaoyan Lei
Jun Shi
Wanyong Liang
Jie Liu
Zongfei Bai
SupR
235
4
0
13 Oct 2024
Multimodal Clickbait Detection by De-confounding Biases Using Causal
  Representation Inference
Multimodal Clickbait Detection by De-confounding Biases Using Causal Representation InferenceConference on Empirical Methods in Natural Language Processing (EMNLP), 2024
Jianxing Yu
Shiqi Wang
Han Yin
Zhenlong Sun
Ruobing Xie
Bo Zhang
Yanghui Rao
CML
147
0
0
10 Oct 2024
Positive-Augmented Contrastive Learning for Vision-and-Language Evaluation and Training
Positive-Augmented Contrastive Learning for Vision-and-Language Evaluation and TrainingInternational Journal of Computer Vision (IJCV), 2024
Sara Sarto
Nicholas Moratelli
Marcella Cornia
Lorenzo Baraldi
Rita Cucchiara
227
9
0
09 Oct 2024
Demonstration Based Explainable AI for Learning from Demonstration
  Methods
Demonstration Based Explainable AI for Learning from Demonstration MethodsIEEE Robotics and Automation Letters (RA-L), 2024
Morris Gu
Elizabeth Croft
Dana Kulic
147
1
0
08 Oct 2024
CoVLM: Leveraging Consensus from Vision-Language Models for
  Semi-supervised Multi-modal Fake News Detection
CoVLM: Leveraging Consensus from Vision-Language Models for Semi-supervised Multi-modal Fake News DetectionAsian Conference on Computer Vision (ACCV), 2024
Devank
Jayateja Kalla
Soma Biswas
138
5
0
06 Oct 2024
Previous
123456...707172
Next