Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
1502.03044
Cited By
v1
v2
v3 (latest)
Show, Attend and Tell: Neural Image Caption Generation with Visual Attention
10 February 2015
Ke Xu
Jimmy Ba
Ryan Kiros
Dong Wang
Aaron Courville
Ruslan Salakhutdinov
R. Zemel
Yoshua Bengio
DiffM
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Show, Attend and Tell: Neural Image Caption Generation with Visual Attention"
50 / 3,580 papers shown
Improving Face Recognition from Caption Supervision with Multi-Granular Contextual Feature Aggregation
Md Golam Moula Mehedi Hasan
Nasser M. Nasrabadi
CVBM
85
2
0
13 Aug 2023
Benign Shortcut for Debiasing: Fair Visual Recognition via Intervention with Shortcut Features
ACM Multimedia (ACM MM), 2023
Yi Zhang
Jitao Sang
Junyan Wang
Shihong Deng
Yaowei Wang
178
9
0
13 Aug 2023
Fine-tuning Multimodal LLMs to Follow Zero-shot Demonstrative Instructions
International Conference on Learning Representations (ICLR), 2023
Juncheng Li
Kaihang Pan
Zhiqi Ge
Minghe Gao
Wei Ji
Wenqiao Zhang
Tat-Seng Chua
Siliang Tang
Hanwang Zhang
Yueting Zhuang
MLLM
284
89
0
08 Aug 2023
D-Score: A Synapse-Inspired Approach for Filter Pruning
Doyoung Park
Jinsoo Kim
Ji-Min Nam
Jooyoung Chang
S. Park
102
0
0
08 Aug 2023
Asynchronous Evolution of Deep Neural Network Architectures
Applied Soft Computing (Appl. Soft Comput.), 2023
J. Liang
Hormoz Shahrzad
Risto Miikkulainen
310
1
0
08 Aug 2023
A Comprehensive Analysis of Real-World Image Captioning and Scene Identification
Sai Suprabhanu Nallapaneni
Subrahmanyam Konakanchi
194
2
0
05 Aug 2023
Frustratingly Easy Model Generalization by Dummy Risk Minimization
Juncheng Wang
Yongfeng Zhang
Xixu Hu
Shujun Wang
Xingxu Xie
213
3
0
04 Aug 2023
Reverse Stable Diffusion: What prompt was used to generate this image?
Computer Vision and Image Understanding (CVIU), 2023
Florinel-Alin Croitoru
Vlad Hondru
Radu Tudor Ionescu
M. Shah
VLM
DiffM
276
10
0
02 Aug 2023
Beyond Generic: Enhancing Image Captioning with Real-World Knowledge using Vision-Language Pre-Training Model
ACM Multimedia (ACM MM), 2023
Ka Leong Cheng
Wenpo Song
Zheng Ma
Wenhao Zhu
Zi-Yue Zhu
Jianbing Zhang
CLIP
VLM
170
18
0
02 Aug 2023
EEG-based Cognitive Load Classification using Feature Masked Autoencoding and Emotion Transfer Learning
International Conference on Multimodal Interaction (ICMI), 2023
Dustin Pulver
Prithila Angkan
Paul Hungler
Ali Etemad
261
15
0
01 Aug 2023
Transferable Decoding with Visual Entities for Zero-Shot Image Captioning
IEEE International Conference on Computer Vision (ICCV), 2023
Junjie Fei
Teng Wang
Jinrui Zhang
Zhenyu He
Chengjie Wang
Feng Zheng
VLM
161
64
0
31 Jul 2023
Triple Correlations-Guided Label Supplementation for Unbiased Video Scene Graph Generation
ACM Multimedia (ACM MM), 2023
Wenqing Wang
Kaifeng Gao
Yawei Luo
Tao Jiang
Fei Gao
Jian Shao
Jianwen Sun
Jun Xiao
226
5
0
30 Jul 2023
DRL4Route: A Deep Reinforcement Learning Framework for Pick-up and Delivery Route Prediction
Knowledge Discovery and Data Mining (KDD), 2023
Xiaowei Mao
Haomin Wen
Hengrui Zhang
Huaiyu Wan
Lixia Wu
Jianbin Zheng
Haoyuan Hu
Youfang Lin
AI4TS
248
17
0
30 Jul 2023
Synaptic Plasticity Models and Bio-Inspired Unsupervised Deep Learning: A Survey
Gabriele Lagani
Fabrizio Falchi
Claudio Gennaro
Giuseppe Amato
AAML
236
8
0
30 Jul 2023
RSGPT: A Remote Sensing Vision Language Model and Benchmark
Isprs Journal of Photogrammetry and Remote Sensing (ISPRS J. Photogramm. Remote Sens.), 2023
Yuan Hu
Jianlong Yuan
Congcong Wen
Xiaonan Lu
Xiang Li
VLM
265
205
0
28 Jul 2023
Learning Multi-modal Representations by Watching Hundreds of Surgical Video Lectures
Kun Yuan
V. Srivastav
Tong Yu
Joël L. Lavanchy
J. Marescaux
Pietro Mascagni
Nassir Navab
N. Padoy
691
44
0
27 Jul 2023
Fact-Checking of AI-Generated Reports
Razi Mahmood
Diego Machado Reyes
Ge Wang
Mannudeep Kalra
Pingkun Yan
MedIm
191
8
0
27 Jul 2023
On the Learning Dynamics of Attention Networks
European Conference on Artificial Intelligence (ECAI), 2023
Rahul Vashisht
H. G. Ramaswamy
281
1
0
25 Jul 2023
Enhancing image captioning with depth information using a Transformer-based framework
Aya Mahmoud Ahmed
Mohamed Yousef
K. Hussain
Yousef B. Mahdy
ViT
207
5
0
24 Jul 2023
Actor-agnostic Multi-label Action Recognition with Multi-modal Query
Anindya Mondal
Sauradip Nag
J. Prada
Xiatian Zhu
Anjan Dutta
253
14
0
20 Jul 2023
Class Attention to Regions of Lesion for Imbalanced Medical Image Recognition
Neurocomputing (Neurocomputing), 2023
Jia-Xin Zhuang
Jiabin Cai
Jianguo Zhang
Wei-Shi Zheng
Ruixuan Wang
191
20
0
19 Jul 2023
Embedded Heterogeneous Attention Transformer for Cross-lingual Image Captioning
IEEE transactions on multimedia (IEEE TMM), 2023
Zijie Song
Zhenzhen Hu
Yuanen Zhou
Ye Zhao
Richang Hong
Meng Wang
203
18
0
19 Jul 2023
A Survey on Open-Vocabulary Detection and Segmentation: Past, Present, and Future
IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2023
Chaoyang Zhu
Long Chen
ObjD
VLM
507
67
0
18 Jul 2023
Human Action Recognition in Still Images Using ConViT
Seyed Rohollah Hosseyni
Sanaz Seyedin
Hasan Taheri
ViT
179
2
0
18 Jul 2023
GenAssist: Making Image Generation Accessible
ACM Symposium on User Interface Software and Technology (UIST), 2023
Mina Huh
Yi-Hao Peng
Amy Pavel
DiffM
212
54
0
14 Jul 2023
AIC-AB NET: A Neural Network for Image Captioning with Spatial Attention and Text Attributes
Guoyun Tu
Ying Liu
Vladimir Vlassov
258
1
0
14 Jul 2023
Bootstrapping Vision-Language Learning with Decoupled Language Pre-training
Neural Information Processing Systems (NeurIPS), 2023
Yiren Jian
Chongyang Gao
Soroush Vosoughi
VLM
MLLM
380
44
0
13 Jul 2023
Is Task-Agnostic Explainable AI a Myth?
Alicja Chaszczewicz
221
2
0
13 Jul 2023
Reading Radiology Imaging Like The Radiologist
Yuhao Wang
MedIm
237
0
0
12 Jul 2023
DyCL: Dynamic Neural Network Compilation Via Program Rewriting and Graph Optimization
International Symposium on Software Testing and Analysis (ISSTA), 2023
Simin Chen
Shiyi Wei
Cong Liu
Wei Yang
176
11
0
11 Jul 2023
Undecimated Wavelet Transform for Word Embedded Semantic Marginal Autoencoder in Security improvement and Denoising different Languages
S. Shreyanth
47
0
0
06 Jul 2023
Multimodal Prompt Learning for Product Title Generation with Extremely Limited Labels
Annual Meeting of the Association for Computational Linguistics (ACL), 2023
Bang-ju Yang
Fenglin Liu
Zheng Li
Qingyu Yin
Chenyu You
Bing Yin
Yuexian Zou
VLM
208
6
0
05 Jul 2023
Seeing in Words: Learning to Classify through Language Bottlenecks
Khalid Saifullah
Yuxin Wen
Jonas Geiping
Micah Goldblum
Tom Goldstein
VLM
133
2
0
29 Jun 2023
Variational latent discrete representation for time series modelling
Max H. Cohen
M. Charbit
Sylvain Le Corff
275
1
0
27 Jun 2023
Self-Supervised Image Captioning with CLIP
Chuanyang Jin
VLM
SSL
209
3
0
26 Jun 2023
Improving Reference-based Distinctive Image Captioning with Contrastive Rewards
Yangjun Mao
Jun Xiao
Dong Zhang
Meng Cao
Jian Shao
Yueting Zhuang
Long Chen
EGVM
200
10
0
25 Jun 2023
Learning Descriptive Image Captioning via Semipermeable Maximum Likelihood Estimation
Neural Information Processing Systems (NeurIPS), 2023
Zihao Yue
Anwen Hu
Liang Zhang
Qin Jin
339
7
0
23 Jun 2023
Dense Video Object Captioning from Disjoint Supervision
International Conference on Learning Representations (ICLR), 2023
Xingyi Zhou
Anurag Arnab
Chen Sun
Cordelia Schmid
286
7
0
20 Jun 2023
KiUT: Knowledge-injected U-Transformer for Radiology Report Generation
Computer Vision and Pattern Recognition (CVPR), 2023
Zhongzhen Huang
Xiaofan Zhang
Shaoting Zhang
MedIm
254
92
0
20 Jun 2023
GraphGLOW: Universal and Generalizable Structure Learning for Graph Neural Networks
Knowledge Discovery and Data Mining (KDD), 2023
Wentao Zhao
Qitian Wu
Chenxiao Yang
Junchi Yan
184
19
0
20 Jun 2023
Multi-Label Meta Weighting for Long-Tailed Dynamic Scene Graph Generation
International Conference on Multimedia Retrieval (ICMR), 2023
Shuo Chen
Yingjun Du
Pascal Mettes
Cees G. M. Snoek
OffRL
298
6
0
16 Jun 2023
Towards AGI in Computer Vision: Lessons Learned from GPT and Large Language Models
Lingxi Xie
Longhui Wei
Xiaopeng Zhang
Kaifeng Bi
Xiaotao Gu
Jianlong Chang
Qi Tian
250
9
0
14 Jun 2023
Top-Down Framework for Weakly-supervised Grounded Image Captioning
Chen Cai
Suchen Wang
Kim-Hui Yap
Yi Wang
ObjD
226
5
0
13 Jun 2023
Multimodal Explainable Artificial Intelligence: A Comprehensive Review of Methodological Advances and Future Research Directions
IEEE Access (IEEE Access), 2023
N. Rodis
Christos Sardianos
Panagiotis I. Radoglou-Grammatikis
Panagiotis G. Sarigiannidis
Iraklis Varlamis
Georgios Th. Papadopoulos
333
38
0
09 Jun 2023
Customizing General-Purpose Foundation Models for Medical Report Generation
Bang-ju Yang
Asif Raza
Yuexian Zou
Tong Zhang
MedIm
173
14
0
09 Jun 2023
Object Detection with Transformers: A Review
Italian National Conference on Sensors (INS), 2023
Tahira Shehzadi
K. Hashmi
D. Stricker
Muhammad Zeshan Afzal
ViT
MU
413
53
0
07 Jun 2023
Towards Adaptable and Interactive Image Captioning with Data Augmentation and Episodic Memory
Aliki Anagnostopoulou
Mareike Hartmann
Daniel Sonntag
CLL
VLM
187
1
0
06 Jun 2023
Putting Humans in the Image Captioning Loop
Aliki Anagnostopoulou
Mareike Hartmann
Daniel Sonntag
VLM
131
3
0
06 Jun 2023
On the Role of Attention in Prompt-tuning
International Conference on Machine Learning (ICML), 2023
Samet Oymak
A. S. Rawat
Mahdi Soltanolkotabi
Christos Thrampoulidis
MLT
LRM
206
59
0
06 Jun 2023
MoviePuzzle: Visual Narrative Reasoning through Multimodal Order Learning
Jianghui Wang
Yuxuan Wang
Dongyan Zhao
Zilong Zheng
342
1
0
04 Jun 2023
Previous
1
2
3
...
8
9
10
...
70
71
72
Next