A Comprehensive Survey of Deep Learning for Image Captioning

6 October 2018

Papers citing "A Comprehensive Survey of Deep Learning for Image Captioning"

50 / 228 papers shown

Title
Do DALL-E and Flamingo Understand Each Other? Hang Li Jindong Gu Rajat Koner Sahand Sharifzadeh Volker Tresp MLLM 16 12 0 23 Dec 2022
Towards Generating Diverse Audio Captions via Adversarial Training Xinhao Mei Xubo Liu Jianyuan Sun Mark D. Plumbley Wenwu Wang DiffM 33 2 0 05 Dec 2022
PLA: Language-Driven Open-Vocabulary 3D Scene Understanding Runyu Ding Jihan Yang Chuhui Xue Wenqing Zhang Song Bai Xiaojuan Qi VLM 15 146 0 29 Nov 2022
Deep representation learning: Fundamentals, Perspectives, Applications, and Open Challenges K. T. Baghaei Amirreza Payandeh Pooya Fayyazsanavi Shahram Rahimi Zhiqian Chen Somayeh Bakhtiari Ramezani FaML AI4TS 30 6 0 27 Nov 2022
Aesthetically Relevant Image Captioning Zhipeng Zhong Fei Zhou Guoping Qiu 31 9 0 25 Nov 2022
Feedback is Needed for Retakes: An Explainable Poor Image Notification Framework for the Visually Impaired Kazuya Ohata Shunsuke Kitada Hitoshi Iyatomi 14 0 0 17 Nov 2022
CapEnrich: Enriching Caption Semantics for Web Images via Cross-modal Pre-trained Knowledge Linli Yao Wei-Neng Chen Qin Jin VLM 22 10 0 17 Nov 2022
Vis2Mus: Exploring Multimodal Representation Mapping for Controllable Music Generation Runbang Zhang Yixiao Zhang Kai Shao Ying Shan Gus Xia 21 4 0 10 Nov 2022
CLSE: Corpus of Linguistically Significant Entities A. Chuklin Justin Zhao Mihir Kale 13 1 0 04 Nov 2022
Physical Adversarial Attack meets Computer Vision: A Decade Survey Hui Wei Hao Tang Xuemei Jia Zhixiang Wang Han-Bing Yu Zhubo Li Shiníchi Satoh Luc Van Gool Zheng Wang AAML 27 43 0 30 Sep 2022
M^4I: Multi-modal Models Membership Inference Pingyi Hu Zihan Wang Ruoxi Sun Hu Wang Minhui Xue 37 26 0 15 Sep 2022
Cross Modal Compression: Towards Human-comprehensible Semantic Compression Jiguo Li Chuanmin Jia Xinfeng Zhang Siwei Ma Wen Gao 9 18 0 06 Sep 2022
Facial Expression Recognition and Image Description Generation in Vietnamese Khang Nhut Lam Kim Thi-Thanh Nguyen Loc Huu Nguy Jugal Kalita 3DH CVBM 15 1 0 12 Aug 2022
A Comprehensive Survey of Natural Language Generation Advances from the Perspective of Digital Deception Keenan I. Jones Enes ALTUNCU V. N. Franqueira Yi-Chia Wang Shujun Li DeLMO 34 3 0 11 Aug 2022
End-to-end deep learning for directly estimating grape yield from ground-based imagery A. Olenskyj B. Sams Zhenghao Fei Vishal Singh P. Raja G. Bornhorst J. M. Earles 26 28 0 04 Aug 2022
Visual Recognition by Request Chufeng Tang Lingxi Xie Xiaopeng Zhang Xiaolin Hu Qi Tian VLM 16 15 0 28 Jul 2022
Controllable Data Generation by Deep Learning: A Review Shiyu Wang Yuanqi Du Xiaojie Guo Bo Pan Zhaohui Qin Liang Zhao 29 28 0 19 Jul 2022
Relational Future Captioning Model for Explaining Likely Collisions in Daily Tasks Motonari Kambara K. Sugiura 17 6 0 19 Jul 2022
Exploring Adversarial Examples and Adversarial Robustness of Convolutional Neural Networks by Mutual Information Jiebao Zhang Wenhua Qian Ren-qi Nie Jinde Cao Dan Xu GAN AAML 17 0 0 12 Jul 2022
Vision-and-Language Pretraining Thong Nguyen Cong-Duy Nguyen Xiaobao Wu See-Kiong Ng A. Luu VLM CLIP 19 2 0 05 Jul 2022
Gender Artifacts in Visual Datasets Nicole Meister Dora Zhao Angelina Wang V. V. Ramaswamy Ruth C. Fong Olga Russakovsky 21 28 0 18 Jun 2022
Image Captioning based on Feature Refinement and Reflective Decoding G. Alabduljabbar Hafida Benhidour Said Kerrache 3DV 14 3 0 16 Jun 2022
Video-based Human-Object Interaction Detection from Tubelet Tokens Danyang Tu Wei Sun Xiongkuo Min Guangtao Zhai Wei Shen ViT 13 15 0 04 Jun 2022
A Generative Adversarial Network-based Selective Ensemble Characteristic-to-Expression Synthesis (SE-CTES) Approach and Its Applications in Healthcare Yuxuan Li Ying-Jia Lin Chenang Liu 23 0 0 29 May 2022
Prompt-based Learning for Unpaired Image Captioning Peipei Zhu Xiao Wang Lin Zhu Zhenglong Sun Weishi Zheng Yaowei Wang C. L. P. Chen VLM 21 31 0 26 May 2022
Beyond Greedy Search: Tracking by Multi-Agent Reinforcement Learning-based Beam Search Xiao Wang Zhe Chen Bo Jiang Jin Tang B. Luo Dacheng Tao 37 18 0 19 May 2022
Efficient Gesture Recognition for the Assistance of Visually Impaired People using Multi-Head Neural Networks Samer Alashhab Antonio Javier Gallego Miguel Ángel Lozano 19 16 0 14 May 2022
Translation between Molecules and Natural Language Carl N. Edwards T. Lai Kevin Ros Garrett Honke Kyunghyun Cho Heng Ji 25 155 0 25 Apr 2022
Visual Attention Methods in Deep Learning: An In-Depth Survey Mohammed Hassanin Saeed Anwar Ibrahim Radwan F. Khan Ajmal Saeed Mian 19 145 0 16 Apr 2022
Guiding Attention using Partial-Order Relationships for Image Captioning Murad Popattia Muhammad Rafi Rizwan Qureshi Shah Nawaz 19 4 0 15 Apr 2022
Image Captioning In the Transformer Age Yangliu Xu Li Li Haiyang Xu Songfang Huang Fei Huang Jianfei Cai ViT 14 5 0 15 Apr 2022
Vision Transformers in Medical Computer Vision -- A Contemplative Retrospection Arshi Parvaiz Muhammad Anwaar Khalid Rukhsana Zafar Huma Ameer M. Ali M. Fraz MedIm 11 59 0 29 Mar 2022
Interactive Robotic Grasping with Attribute-Guided Disambiguation Yang Yang Xibai Lou Changhyun Choi 11 30 0 15 Mar 2022
Unpaired Image Captioning by Image-level Weakly-Supervised Visual Concept Recognition Peipei Zhu Xiao Wang Yong Luo Zhenglong Sun Wei-Shi Zheng Yaowei Wang C. L. P. Chen 22 12 0 07 Mar 2022
A Review of Emerging Research Directions in Abstract Visual Reasoning Mikolaj Malkiñski Jacek Mañdziuk 23 38 0 21 Feb 2022
ACORT: A Compact Object Relation Transformer for Parameter Efficient Image Captioning J. Tan Y. Tan C. Chan Joon Huang Chuah VLM ViT 16 15 0 11 Feb 2022
Deep Learning Approaches on Image Captioning: A Review Taraneh Ghandi H. Pourreza H. Mahyar VLM 8 89 0 31 Jan 2022
A Frustratingly Simple Approach for End-to-End Image Captioning Ziyang Luo Yadong Xi Rongsheng Zhang Jing Ma VLM MLLM 22 16 0 30 Jan 2022
Automatic Audio Captioning using Attention weighted Event based Embeddings Swapnil Bhosale Rupayan Chakraborty Sunil Kumar Kopparapu 26 0 0 28 Jan 2022
Beyond Simple Meta-Learning: Multi-Purpose Models for Multi-Domain, Active and Continual Few-Shot Learning Peyman Bateni Jarred Barber Raghav Goyal Vaden Masrani Jan Willem van de Meent Leonid Sigal Frank D. Wood BDL VLM 42 9 0 13 Jan 2022
Technical Language Supervision for Intelligent Fault Diagnosis in Process Industry Karl Lowenmark C. Taal S. Schnabel Marcus Liwicki Fredrik Sandin 13 7 0 11 Dec 2021
Multimodal Fake News Detection Santiago Alonso-Bartolome Isabel Segura-Bedmar 17 60 0 09 Dec 2021
Neural Attention for Image Captioning: Review of Outstanding Methods Zanyar Zohourianshahzadi Jugal Kalita VLM 19 45 0 29 Nov 2021
Multi-Glimpse Network: A Robust and Efficient Classification Architecture based on Recurrent Downsampled Attention S. Tan Runpei Dong Kaisheng Ma 22 2 0 03 Nov 2021
Deep Learning in Human Activity Recognition with Wearable Sensors: A Review on Advances Shibo Zhang Yaxuan Li Shen Zhang Farzad Shahabi S. Xia Yuanbei Deng N. Alshurafa BDL 20 295 0 31 Oct 2021
End-to-End Supermask Pruning: Learning to Prune Image Captioning Models J. Tan C. Chan Joon Huang Chuah VLM 49 16 0 07 Oct 2021
Learning Structural Representations for Recipe Generation and Food Retrieval Hao Wang Guosheng Lin S. Hoi C. Miao 16 28 0 04 Oct 2021
Similar Scenes arouse Similar Emotions: Parallel Data Augmentation for Stylized Image Captioning Guodun Li Yuchen Zhai Zehao Lin Yin Zhang 43 21 0 26 Aug 2021
INVIGORATE: Interactive Visual Grounding and Grasping in Clutter Hanbo Zhang Yunfan Lu Cunjun Yu David Hsu Xuguang Lan Nanning Zheng LM&Ro 18 63 0 25 Aug 2021
Explainable Reinforcement Learning for Broad-XAI: A Conceptual Framework and Survey Richard Dazeley Peter Vamplew Francisco Cruz 24 59 0 20 Aug 2021