Show, Attend and Tell: Neural Image Caption Generation with Visual Attention

10 February 2015

Jimmy Ba

Aaron Courville

Papers citing "Show, Attend and Tell: Neural Image Caption Generation with Visual Attention"

50 / 3,507 papers shown

Title
Performance Analysis of Traditional VQA Models Under Limited Computational Resources Jihao Gu 42 0 0 09 Feb 2025
Using Large Language Models for education managements in Vietnamese with low resources Duc Do Minh Vinh Nguyen Van Thang Dam Cong 38 0 0 28 Jan 2025
A Study of the Plausibility of Attention between RNN Encoders in Natural Language Inference Duc Hau Nguyen Duc Hau Nguyen Pascale Sébillot 42 5 0 23 Jan 2025
The Quest for Visual Understanding: A Journey Through the Evolution of Visual Question Answering Anupam Pandey Deepjyoti Bodo Arpan Phukan Asif Ekbal 36 0 0 13 Jan 2025
H-MBA: Hierarchical MamBa Adaptation for Multi-Modal Video Understanding in Autonomous Driving S. Chen Yuxiao Luo Yue Ma Yu Qiao Yali Wang Mamba 42 1 0 08 Jan 2025
GIT-CXR: End-to-End Transformer for Chest X-Ray Report Generation Iustin Sîrbu Iulia-Renata Sîrbu Jasmina Bogojeska Traian Rebedea MedIm ViT LM&MA 31 0 0 05 Jan 2025
Classifier-Guided Captioning Across Modalities Ariel Shaulov Tal Shaharabany E. Shaar Gal Chechik Lior Wolf 28 0 0 03 Jan 2025
Unleashing Text-to-Image Diffusion Prior for Zero-Shot Image Captioning Jianjie Luo Jingwen Chen Yehao Li Yingwei Pan Jianlin Feng Hongyang Chao Ting Yao DiffM VLM 45 0 0 03 Jan 2025
Reframing Image Difference Captioning with BLIP2IDC and Synthetic Augmentation Gautier Evennou Antoine Chaffin Vivien Chappelier Ewa Kijak DiffM 68 0 0 20 Dec 2024
Automated Image Captioning with CNNs and Transformers Joshua Adrian Cahyono Jeremy Nathan Jusuf VLM ViT 75 0 0 13 Dec 2024
Advancing Attribution-Based Neural Network Explainability through Relative Absolute Magnitude Layer-Wise Relevance Propagation and Multi-Component Evaluation Davor Vukadin Petar Afrić Marin Šilić Goran Delač FAtt 93 2 0 12 Dec 2024
Automated Medical Report Generation for ECG Data: Bridging Medical Text and Signal Processing with Deep Learning Amnon Bleich A. Linnemann B. Diem Tim Conrad MedIm 65 2 0 05 Dec 2024
Who Brings the Frisbee: Probing Hidden Hallucination Factors in Large Vision-Language Model via Causality Analysis Po-Hsuan Huang Jeng-Lin Li Chin-Po Chen Ming-Ching Chang Wei-Chao Chen LRM 74 1 0 04 Dec 2024
Was that Sarcasm?: A Literature Survey on Sarcasm Detection Harleen Kaur Bagga Jasmine Bernard Sahil Shaheen Sarthak Arora 64 0 0 30 Nov 2024
VLM-HOI: Vision Language Models for Interpretable Human-Object Interaction Analysis Donggoo Kang Dasol Jeong Hyunmin Lee Sangwoo Park Hasil Park Sunkyu Kwon Yeongjoon Kim Joonki Paik MLLM VLM 69 0 0 27 Nov 2024
GeoFormer: A Multi-Polygon Segmentation Transformer Maxim Khomiakov Michael Riis Andersen J. Frellsen 68 0 0 25 Nov 2024
Can Reasons Help Improve Pedestrian Intent Estimation? A Cross-Modal Approach Vaishnavi Khindkar V. Balasubramanian Chetan Arora A. Subramanian C. V. Jawahar 69 0 0 20 Nov 2024
CUE-M: Contextual Understanding and Enhanced Search with Multimodal Large Language Model Dongyoung Go Taesun Whang Chanhee Lee Hwayeon Kim Sunghoon Park Seunghwan Ji Dongchan Kim Young-Bum Kim Young-Bum Kim LRM 148 1 0 19 Nov 2024
Anatomy-Guided Radiology Report Generation with Pathology-Aware Regional Prompts Yijian Gao D. C. Marshall Xiaodan Xing Junzhi Ning G. Papanastasiou G. Yang M. Komorowski MedIm 24 0 0 16 Nov 2024
SASE: A Searching Architecture for Squeeze and Excitation Operations Hanming Wang Yunlong Li Zijun Wu Huifen Wang Yuan Zhang 3DPC 21 0 0 13 Nov 2024
Multi-Modal interpretable automatic video captioning Antoine Hanna-Asaad Decky Aspandi Titus Zaharia 31 0 0 11 Nov 2024
Extended multi-stream temporal-attention module for skeleton-based human action recognition (HAR) Faisal Mehmood Xin Guo Enqing Chen Muhammad Azeem Akbar A. Khan Sami Ullah 23 4 0 10 Nov 2024
Generalization and Risk Bounds for Recurrent Neural Networks Xuewei Cheng Ke Huang Shujie Ma 24 1 0 05 Nov 2024
FactorizePhys: Matrix Factorization for Multidimensional Attention in Remote Physiological Sensing Jitesh Joshi Sos S. Agaian Youngjun Cho AI4TS 39 1 0 03 Nov 2024
Semi-supervised Chinese Poem-to-Painting Generation via Cycle-consistent Adversarial Networks Zhengyang Lu Tianhao Guo Feng Wang GAN 24 1 0 25 Oct 2024
Anomaly Resilient Temporal QoS Prediction using Hypergraph Convoluted Transformer Network Suraj Kumar S. Chattopadhyay Chandranath Adak 26 0 0 23 Oct 2024
PromptExp: Multi-granularity Prompt Explanation of Large Language Models Ximing Dong Shaowei Wang Dayi Lin Gopi Krishnan Rajbahadur Boquan Zhou Shichao Liu Ahmed E. Hassan AAML LRM 23 1 0 16 Oct 2024
HASN: Hybrid Attention Separable Network for Efficient Image Super-resolution Weifeng Cao Xiaoyan Lei Jun Shi Wanyong Liang Jie Liu Zongfei Bai SupR 24 0 0 13 Oct 2024
Continuous Risk Prediction Yi Dai 15 1 0 12 Oct 2024
Multimodal Clickbait Detection by De-confounding Biases Using Causal Representation Inference Jianxing Yu Shiqi Wang Han Yin Zhenlong Sun Ruobing Xie Bo Zhang Yanghui Rao CML 30 0 0 10 Oct 2024
Positive-Augmented Contrastive Learning for Vision-and-Language Evaluation and Training Sara Sarto Nicholas Moratelli Marcella Cornia Lorenzo Baraldi Rita Cucchiara 28 3 0 09 Oct 2024
Demonstration Based Explainable AI for Learning from Demonstration Methods Morris Gu Elizabeth Croft Dana Kulic 18 0 0 08 Oct 2024
CoVLM: Leveraging Consensus from Vision-Language Models for Semi-supervised Multi-modal Fake News Detection Devank Jayateja Kalla Soma Biswas 34 1 0 06 Oct 2024
BadCM: Invisible Backdoor Attack Against Cross-Modal Learning Zheng Zhang Xu Yuan Lei Zhu Jingkuan Song Liqiang Nie AAML 34 11 0 03 Oct 2024
Facial Action Unit Detection by Adaptively Constraining Self-Attention and Causally Deconfounding Sample Zhiwen Shao Hancheng Zhu Yong Zhou Xiang Xiang Bing-Quan Liu Rui Yao Lizhuang Ma CML 19 2 0 02 Oct 2024
softmax is not enough (for sharp out-of-distribution) Petar Veličković Christos Perivolaropoulos Federico Barbero Razvan Pascanu 37 17 0 01 Oct 2024
DreamStruct: Understanding Slides and User Interfaces via Synthetic Data Generation Yi-Hao Peng Faria Huq Yue Jiang Jason Wu Amanda Li Jeffrey P. Bigham Amy Pavel DiffM 25 4 0 30 Sep 2024
See Detail Say Clear: Towards Brain CT Report Generation via Pathological Clue-driven Representation Learning Chengxin Zheng Junzhong Ji Yanzhao Shi Xiaodan Zhang Liangqiong Qu 3DV MedIm 24 3 0 29 Sep 2024
DENEB: A Hallucination-Robust Automatic Evaluation Metric for Image Captioning Kazuki Matsuda Yuiga Wada Komei Sugiura 26 1 0 28 Sep 2024
IFCap: Image-like Retrieval and Frequency-based Entity Filtering for Zero-shot Captioning Soeun Lee Si-Woo Kim Taewhan Kim Dong-Jin Kim CLIP VLM 26 0 0 26 Sep 2024
From Linguistic Giants to Sensory Maestros: A Survey on Cross-Modal Reasoning with Large Language Models Shengsheng Qian Zuyi Zhou Dizhan Xue Bing Wang Changsheng Xu LRM 34 1 0 19 Sep 2024
OneEncoder: A Lightweight Framework for Progressive Alignment of Modalities Bilal Faye Hanane Azzag M. Lebbah ObjD 28 0 0 17 Sep 2024
PROSE-FD: A Multimodal PDE Foundation Model for Learning Multiple Operators for Forecasting Fluid Dynamics Yuxuan Liu Jingmin Sun Xinjie He Griffin Pinney Zecheng Zhang Hayden Schaeffer AI4CE 35 5 0 15 Sep 2024
KARGEN: Knowledge-enhanced Automated Radiology Report Generation Using Large Language Models Yingshu Li Zhanyu Wang Yunyi Liu Lei Wang Lingqiao Liu Luping Zhou 33 3 0 09 Sep 2024
FODA-PG for Enhanced Medical Imaging Narrative Generation: Adaptive Differentiation of Normal and Abnormal Attributes Kai Shu Yuzhuo Jia Ziyang Zhang Jiechao Gao MedIm 24 0 0 06 Sep 2024
TempMe: Video Temporal Token Merging for Efficient Text-Video Retrieval Leqi Shen Tianxiang Hao Tao He Sicheng Zhao Pengzhang Liu Yongjun Bao Guiguang Ding Guiguang Ding 109 7 0 02 Sep 2024
See or Guess: Counterfactually Regularized Image Captioning Qian Cao Xu Chen Ruihua Song Xiting Wang Xinting Huang Yuchen Ren CML 29 1 0 29 Aug 2024
Pixels to Prose: Understanding the art of Image Captioning Hrishikesh Singh Aarti Sharma Millie Pant 3DV VLM 25 0 0 28 Aug 2024
Graph Attention Inference of Network Topology in Multi-Agent Systems Akshay Kolli Reza Azadeh Kshitj Jerath GNN 14 1 0 27 Aug 2024
Revisiting Image Captioning Training Paradigm via Direct CLIP-based Optimization Nicholas Moratelli Davide Caffagni Marcella Cornia Lorenzo Baraldi Rita Cucchiara CLIP 29 3 0 26 Aug 2024