v1v2v3 (latest)

Show, Attend and Tell: Neural Image Caption Generation with Visual Attention

10 February 2015

Jimmy Ba

Aaron Courville

Papers citing "Show, Attend and Tell: Neural Image Caption Generation with Visual Attention"

50 / 3,580 papers shown

Improving Face Recognition from Caption Supervision with Multi-Granular Contextual Feature Aggregation

Md Golam Moula Mehedi Hasan

Nasser M. Nasrabadi

CVBM

13 Aug 2023

Benign Shortcut for Debiasing: Fair Visual Recognition via Intervention with Shortcut FeaturesACM Multimedia (ACM MM), 2023

Yaowei Wang

178

13 Aug 2023

Fine-tuning Multimodal LLMs to Follow Zero-shot Demonstrative InstructionsInternational Conference on Learning Representations (ICLR), 2023

Wei Ji

284

08 Aug 2023

D-Score: A Synapse-Inspired Approach for Filter Pruning

102

08 Aug 2023

Asynchronous Evolution of Deep Neural Network ArchitecturesApplied Soft Computing (Appl. Soft Comput.), 2023

J. Liang

Hormoz Shahrzad

Risto Miikkulainen

310

08 Aug 2023

A Comprehensive Analysis of Real-World Image Captioning and Scene Identification

Sai Suprabhanu Nallapaneni

Subrahmanyam Konakanchi

194

05 Aug 2023

Frustratingly Easy Model Generalization by Dummy Risk Minimization

213

04 Aug 2023

Reverse Stable Diffusion: What prompt was used to generate this image?Computer Vision and Image Understanding (CVIU), 2023

Florinel-Alin Croitoru

276

02 Aug 2023

Beyond Generic: Enhancing Image Captioning with Real-World Knowledge using Vision-Language Pre-Training ModelACM Multimedia (ACM MM), 2023

170

02 Aug 2023

EEG-based Cognitive Load Classification using Feature Masked Autoencoding and Emotion Transfer LearningInternational Conference on Multimodal Interaction (ICMI), 2023

261

01 Aug 2023

Transferable Decoding with Visual Entities for Zero-Shot Image CaptioningIEEE International Conference on Computer Vision (ICCV), 2023

Chengjie Wang

161

31 Jul 2023

Triple Correlations-Guided Label Supplementation for Unbiased Video Scene Graph GenerationACM Multimedia (ACM MM), 2023

Fei Gao

226

30 Jul 2023

DRL4Route: A Deep Reinforcement Learning Framework for Pick-up and Delivery Route PredictionKnowledge Discovery and Data Mining (KDD), 2023

Xiaowei Mao

Haomin Wen

248

30 Jul 2023

Synaptic Plasticity Models and Bio-Inspired Unsupervised Deep Learning: A Survey

236

30 Jul 2023

RSGPT: A Remote Sensing Vision Language Model and BenchmarkIsprs Journal of Photogrammetry and Remote Sensing (ISPRS J. Photogramm. Remote Sens.), 2023

265

205

28 Jul 2023

Learning Multi-modal Representations by Watching Hundreds of Surgical Video Lectures

691

27 Jul 2023

Fact-Checking of AI-Generated Reports

191

27 Jul 2023

On the Learning Dynamics of Attention NetworksEuropean Conference on Artificial Intelligence (ECAI), 2023

Rahul Vashisht

H. G. Ramaswamy

281

25 Jul 2023

Enhancing image captioning with depth information using a Transformer-based framework

207

24 Jul 2023

Actor-agnostic Multi-label Action Recognition with Multi-modal Query

253

20 Jul 2023

Class Attention to Regions of Lesion for Imbalanced Medical Image RecognitionNeurocomputing (Neurocomputing), 2023

191

19 Jul 2023

Embedded Heterogeneous Attention Transformer for Cross-lingual Image CaptioningIEEE transactions on multimedia (IEEE TMM), 2023

Zijie Song

Zhenzhen Hu

Yuanen Zhou

Ye Zhao

Richang Hong

Meng Wang

203

19 Jul 2023

A Survey on Open-Vocabulary Detection and Segmentation: Past, Present, and FutureIEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2023

Chaoyang Zhu

Long Chen

ObjD VLM

507

18 Jul 2023

Human Action Recognition in Still Images Using ConViT

Seyed Rohollah Hosseyni

Sanaz Seyedin

Hasan Taheri

ViT

179

18 Jul 2023

GenAssist: Making Image Generation AccessibleACM Symposium on User Interface Software and Technology (UIST), 2023

212

14 Jul 2023

AIC-AB NET: A Neural Network for Image Captioning with Spatial Attention and Text Attributes

Guoyun Tu

Ying Liu

Vladimir Vlassov

258

14 Jul 2023

Bootstrapping Vision-Language Learning with Decoupled Language Pre-trainingNeural Information Processing Systems (NeurIPS), 2023

380

13 Jul 2023

Is Task-Agnostic Explainable AI a Myth?

Alicja Chaszczewicz

221

13 Jul 2023

Reading Radiology Imaging Like The Radiologist

Yuhao Wang

MedIm

237

12 Jul 2023

DyCL: Dynamic Neural Network Compilation Via Program Rewriting and Graph OptimizationInternational Symposium on Software Testing and Analysis (ISSTA), 2023

176

11 Jul 2023

Undecimated Wavelet Transform for Word Embedded Semantic Marginal Autoencoder in Security improvement and Denoising different Languages

S. Shreyanth

06 Jul 2023

Multimodal Prompt Learning for Product Title Generation with Extremely Limited LabelsAnnual Meeting of the Association for Computational Linguistics (ACL), 2023

208

05 Jul 2023

Seeing in Words: Learning to Classify through Language Bottlenecks

133

29 Jun 2023

Variational latent discrete representation for time series modelling

Max H. Cohen

M. Charbit

Sylvain Le Corff

275

27 Jun 2023

Self-Supervised Image Captioning with CLIP

Chuanyang Jin

VLM SSL

209

26 Jun 2023

Improving Reference-based Distinctive Image Captioning with Contrastive Rewards

200

25 Jun 2023

Learning Descriptive Image Captioning via Semipermeable Maximum Likelihood EstimationNeural Information Processing Systems (NeurIPS), 2023

Zihao Yue

Anwen Hu

Liang Zhang

Qin Jin

339

23 Jun 2023

Dense Video Object Captioning from Disjoint SupervisionInternational Conference on Learning Representations (ICLR), 2023

286

20 Jun 2023

KiUT: Knowledge-injected U-Transformer for Radiology Report GenerationComputer Vision and Pattern Recognition (CVPR), 2023

254

20 Jun 2023

GraphGLOW: Universal and Generalizable Structure Learning for Graph Neural NetworksKnowledge Discovery and Data Mining (KDD), 2023

184

20 Jun 2023

Multi-Label Meta Weighting for Long-Tailed Dynamic Scene Graph GenerationInternational Conference on Multimedia Retrieval (ICMR), 2023

298

16 Jun 2023

Towards AGI in Computer Vision: Lessons Learned from GPT and Large Language Models

Xiaotao Gu

250

14 Jun 2023

Top-Down Framework for Weakly-supervised Grounded Image Captioning

Yi Wang

226

13 Jun 2023

Multimodal Explainable Artificial Intelligence: A Comprehensive Review of Methodological Advances and Future Research DirectionsIEEE Access (IEEE Access), 2023

N. Rodis

Christos Sardianos

Panagiotis I. Radoglou-Grammatikis

Panagiotis G. Sarigiannidis

Iraklis Varlamis

Georgios Th. Papadopoulos

333

09 Jun 2023

Customizing General-Purpose Foundation Models for Medical Report Generation

Tong Zhang

173

09 Jun 2023

Object Detection with Transformers: A ReviewItalian National Conference on Sensors (INS), 2023

Tahira Shehzadi

K. Hashmi

D. Stricker

Muhammad Zeshan Afzal

ViT MU

413

07 Jun 2023

Towards Adaptable and Interactive Image Captioning with Data Augmentation and Episodic Memory

Aliki Anagnostopoulou

Mareike Hartmann

Daniel Sonntag

CLL VLM

187

06 Jun 2023

Putting Humans in the Image Captioning Loop

Aliki Anagnostopoulou

Mareike Hartmann

Daniel Sonntag

VLM

131

06 Jun 2023

On the Role of Attention in Prompt-tuningInternational Conference on Machine Learning (ICML), 2023

Samet Oymak

A. S. Rawat

Mahdi Soltanolkotabi

Christos Thrampoulidis

MLT LRM

206

06 Jun 2023

MoviePuzzle: Visual Narrative Reasoning through Multimodal Order Learning

342

04 Jun 2023