v1v2v3 (latest)

Show, Attend and Tell: Neural Image Caption Generation with Visual Attention

10 February 2015

Jimmy Ba

Aaron Courville

Papers citing "Show, Attend and Tell: Neural Image Caption Generation with Visual Attention"

50 / 3,578 papers shown

Title
Federated Multimodal Learning with Dual Adapters and Selective Pruning for Communication and Computational EfficiencyIEEE/ACM International Symposium on Cluster, Cloud and Internet Computing (CCGrid), 2025 Duy Phuong Nguyen J. P. Muñoz Tanya Roosta Ali Jannesari FedML 212 2 0 10 Mar 2025
A Woman with a Knife or A Knife with a Woman? Measuring Directional Bias Amplification in Image Captions Rahul Nair Bhanu Tokas Neel Shah 335 0 0 10 Mar 2025
TI-JEPA: An Innovative Energy-based Joint Embedding Strategy for Text-Image Multimodal Systems Khang H. N. Vo D. Q. Nguyen T. Nguyen Tho Quan 234 5 0 09 Mar 2025
MSConv: Multiplicative and Subtractive Convolution for Face Recognition Si Zhou Yain-Whar Si Xiaochen Yuan Xiaofan Li Xiaoxiang Liu Xinyuan Zhang Cong Lin Xueyuan Gong CVBM 272 0 0 08 Mar 2025
Extracting Symbolic Sequences from Visual Representations via Self-Supervised Learning Victor Sebastian Martinez Pozos Ivan Vladimir Meza Ruiz 156 0 0 06 Mar 2025
Cross-modal Causal Relation Alignment for Video Question GroundingComputer Vision and Pattern Recognition (CVPR), 2025 Weixing Chen Wenshu Fan Binglin Chen Jiandong Su Yongsen Zheng Guanbin Li BDL VGen CML 262 7 0 05 Mar 2025
Abn-BLIP: Abnormality-aligned Bootstrapping Language-Image Pre-training for Pulmonary Embolism Diagnosis and Report Generation from CTPAMedical Image Analysis (MedIA), 2025 Z. Zhong Yuli Wang Lulu Bi Zhuoqi Ma S. H. Ahn ... Webster Stayman Todd M. Kolb I. Kamel Harrison X. Bai Zhicheng Jiao LM&MA 205 0 0 03 Mar 2025
AC-Lite : A Lightweight Image Captioning Model for Low-Resource Assamese Language Pankaj Choudhury Yogesh Aggarwal Prabhanjan Jadhav Prithwijit Guha Sukumar Nandi 335 0 0 03 Mar 2025
A Survey of Link Prediction in Temporal Networks Jiafeng Xiong Ahmad Zareie Rizos Sakellariou AI4TS AI4CE 181 6 0 28 Feb 2025
Beyond RNNs: Benchmarking Attention-Based Image Captioning Models Hemanth Teja Yanambakkam Rahul Chinthala 91 0 0 26 Feb 2025
Omni-SILA: Towards Omni-scene Driven Visual Sentiment Identifying, Locating and Attributing in VideosThe Web Conference (WWW), 2025 Jiamin Luo Jingjing Wang Junxiao Ma Yujie Jin Shoushan Li Guodong Zhou 244 1 0 26 Feb 2025
Grad-ECLIP: Gradient-based Visual and Textual Explanations for CLIP Chenyang Zhao Kun Wang J. H. Hsiao Antoni B. Chan CLIP 247 6 0 26 Feb 2025
Good Representation, Better Explanation: Role of Convolutional Neural Networks in Transformer-Based Remote Sensing Image Captioning Swadhin Das Saarthak Gupta and Kamal Kumar Raksha Sharma 135 2 0 22 Feb 2025
CAPability: A Comprehensive Visual Caption Benchmark for Evaluating Both Correctness and Thoroughness Zhihang Liu Chen-Wei Xie Bin Wen Feiwu Yu Jixuan Chen ... Nianzu Yang Yinglu Li Zuan Gao Yun Zheng Hongtao Xie VLM CoGe 430 0 0 19 Feb 2025
A Comprehensive Survey on Composed Image Retrieval Xuemeng Song Haoqiang Lin Haokun Wen Bohan Hou Mingzhu Xu Liqiang Nie 421 7 0 19 Feb 2025
Performance Analysis of Traditional VQA Models Under Limited Computational Resources Jihao Gu 250 1 0 09 Feb 2025
Using Large Language Models for education managements in Vietnamese with low resourcesPacific Asia Conference on Language, Information and Computation (PACLIC), 2025 Duc Do Minh Vinh Nguyen Van Thang Dam Cong 244 2 0 28 Jan 2025
A Study of the Plausibility of Attention between RNN Encoders in Natural Language InferenceInternational Conference on Machine Learning and Applications (ICMLA), 2021 Duc Hau Nguyen Duc Hau Nguyen Pascale Sébillot 192 6 0 23 Jan 2025
The Quest for Visual Understanding: A Journey Through the Evolution of Visual Question Answering Anupam Pandey Deepjyoti Bodo Arpan Phukan Asif Ekbal 379 2 0 13 Jan 2025
H-MBA: Hierarchical MamBa Adaptation for Multi-Modal Video Understanding in Autonomous DrivingAAAI Conference on Artificial Intelligence (AAAI), 2025 Tian Jin Yuxiao Luo Yue Ma Yu Qiao Yali Wang Mamba 238 5 0 08 Jan 2025
GIT-CXR: End-to-End Transformer for Chest X-Ray Report Generation Iustin Sîrbu Iulia-Renata Sîrbu Jasmina Bogojeska Traian Rebedea MedIm ViT LM&MA 168 3 0 05 Jan 2025
Classifier-Guided Captioning Across ModalitiesIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2025 Ariel Shaulov Tal Shaharabany E. Shaar Gal Chechik Lior Wolf 197 0 0 03 Jan 2025
Unleashing Text-to-Image Diffusion Prior for Zero-Shot Image CaptioningEuropean Conference on Computer Vision (ECCV), 2024 Jianjie Luo Jingwen Chen Yehao Li Yingwei Pan Jianlin Feng Hongyang Chao Ting Yao DiffM VLM 241 1 0 03 Jan 2025
Real-time Bangla Sign Language Translator Rotan Hawlader Pranto Shahnewaz Siddique SLR 140 2 0 21 Dec 2024
Reframing Image Difference Captioning with BLIP2IDC and Synthetic AugmentationIEEE Workshop/Winter Conference on Applications of Computer Vision (WACV), 2024 Gautier Evennou Antoine Chaffin Vivien Chappelier Ewa Kijak DiffM 236 1 0 20 Dec 2024
Automated Image Captioning with CNNs and Transformers Joshua Adrian Cahyono Jeremy Nathan Jusuf VLM ViT 119 1 0 13 Dec 2024
Advancing Attribution-Based Neural Network Explainability through Relative Absolute Magnitude Layer-Wise Relevance Propagation and Multi-Component EvaluationACM Transactions on Intelligent Systems and Technology (ACM TIST), 2024 Davor Vukadin Petar Afrić Marin Šilić Goran Delač FAtt 231 2 0 12 Dec 2024
FlashSloth: Lightning Multimodal Large Language Models via Embedded Visual CompressionComputer Vision and Pattern Recognition (CVPR), 2024 Bo Tong Bokai Lai Weihao Ye Gen Luo Chunjiang Ge Ke Li Xiaoshuai Sun Rongrong Ji VLM MLLM 185 4 0 05 Dec 2024
Automated Medical Report Generation for ECG Data: Bridging Medical Text and Signal Processing with Deep Learning Amnon Bleich A. Linnemann B. Diem Tim Conrad MedIm 191 4 0 05 Dec 2024
Who Brings the Frisbee: Probing Hidden Hallucination Factors in Large Vision-Language Model via Causality AnalysisIEEE Workshop/Winter Conference on Applications of Computer Vision (WACV), 2024 Po-Hsuan Huang Jeng-Lin Li Chin-Po Chen Ming-Ching Chang Wei-Chao Chen LRM 257 4 0 04 Dec 2024
Was that Sarcasm?: A Literature Survey on Sarcasm Detection Harleen Kaur Bagga Jasmine Bernard Sahil Shaheen Sarthak Arora 157 1 0 30 Nov 2024
Detailed Object Description with Controllable DimensionsIEEE transactions on multimedia (IEEE TMM), 2024 Xinran Wang Hao Zhang Baoteng Li Kongming Liang Hao Sun Zhongjiang He Tianhao Shen Jun Guo 281 1 0 28 Nov 2024
VLM-HOI: Vision Language Models for Interpretable Human-Object Interaction Analysis Donggoo Kang Dasol Jeong Hyunmin Lee Sangwoo Park Hasil Park Sunkyu Kwon Yeongjoon Kim Joonki Paik MLLM VLM 308 1 0 27 Nov 2024
GeoFormer: A Multi-Polygon Segmentation TransformerBritish Machine Vision Conference (BMVC), 2024 Maxim Khomiakov Michael Riis Andersen J. Frellsen 183 1 0 25 Nov 2024
Can Reasons Help Improve Pedestrian Intent Estimation? A Cross-Modal ApproachIEEE/RJS International Conference on Intelligent RObots and Systems (IROS), 2024 Vaishnavi Khindkar V. Balasubramanian Chetan Arora A. Subramanian C. V. Jawahar 251 0 0 20 Nov 2024
CUE-M: Contextual Understanding and Enhanced Search with Multimodal Large Language Model Dongyoung Go Taesun Whang Chanhee Lee Hwayeon Kim Sunghoon Park Seunghwan Ji Dongchan Kim Young-Bum Kim Young-Bum Kim LRM 1.0K 1 0 19 Nov 2024
Anatomy-Guided Radiology Report Generation with Pathology-Aware Regional Prompts Yijian Gao D. C. Marshall Xiaodan Xing Junzhi Ning G. Papanastasiou G. Yang M. Komorowski MedIm 156 0 0 16 Nov 2024
SASE: A Searching Architecture for Squeeze and Excitation Operations Hanming Wang Yunlong Li Zijun Wu Huifen Wang Yuan Zhang 3DPC 117 1 0 13 Nov 2024
Multi-Modal interpretable automatic video captioning Antoine Hanna-Asaad Decky Aspandi Titus Zaharia 223 1 0 11 Nov 2024
Extended multi-stream temporal-attention module for skeleton-based human action recognition (HAR)Computers in Human Behavior (CHB), 2024 Faisal Mehmood Xin Guo Enqing Chen Muhammad Azeem Akbar A. Khan Sami Ullah 246 8 0 10 Nov 2024
Generalization and Risk Bounds for Recurrent Neural Networks Xuewei Cheng Ke Huang Shujie Ma 295 1 0 05 Nov 2024
FactorizePhys: Matrix Factorization for Multidimensional Attention in Remote Physiological SensingNeural Information Processing Systems (NeurIPS), 2024 Jitesh Joshi Sos S. Agaian Youngjun Cho AI4TS 270 6 0 03 Nov 2024
Semi-supervised Chinese Poem-to-Painting Generation via Cycle-consistent Adversarial Networks Zhengyang Lu Tianhao Guo Feng Wang GAN 145 7 0 25 Oct 2024
Anomaly Resilient Temporal QoS Prediction using Hypergraph Convoluted Transformer Network Suraj Kumar S. Chattopadhyay Chandranath Adak 139 0 0 23 Oct 2024
PromptExp: Multi-granularity Prompt Explanation of Large Language Models Ximing Dong Shaowei Wang Dayi Lin Gopi Krishnan Rajbahadur Boquan Zhou Shichao Liu Ahmed E. Hassan AAML LRM 335 4 0 16 Oct 2024
HASN: Hybrid Attention Separable Network for Efficient Image Super-resolutionThe Visual Computer (VC), 2024 Weifeng Cao Xiaoyan Lei Jun Shi Wanyong Liang Jie Liu Zongfei Bai SupR 235 4 0 13 Oct 2024
Multimodal Clickbait Detection by De-confounding Biases Using Causal Representation InferenceConference on Empirical Methods in Natural Language Processing (EMNLP), 2024 Jianxing Yu Shiqi Wang Han Yin Zhenlong Sun Ruobing Xie Bo Zhang Yanghui Rao CML 147 0 0 10 Oct 2024
Positive-Augmented Contrastive Learning for Vision-and-Language Evaluation and TrainingInternational Journal of Computer Vision (IJCV), 2024 Sara Sarto Nicholas Moratelli Marcella Cornia Lorenzo Baraldi Rita Cucchiara 227 9 0 09 Oct 2024
Demonstration Based Explainable AI for Learning from Demonstration MethodsIEEE Robotics and Automation Letters (RA-L), 2024 Morris Gu Elizabeth Croft Dana Kulic 147 1 0 08 Oct 2024
CoVLM: Leveraging Consensus from Vision-Language Models for Semi-supervised Multi-modal Fake News DetectionAsian Conference on Computer Vision (ACCV), 2024 Devank Jayateja Kalla Soma Biswas 138 5 0 06 Oct 2024