From Show to Tell: A Survey on Deep Learning-based Image Captioning

14 July 2021

Lorenzo Baraldi

Papers citing "From Show to Tell: A Survey on Deep Learning-based Image Captioning"

50 / 115 papers shown

Title
Emotional Theory of Mind: Bridging Fast Visual Processing with Slow Linguistic Reasoning Yasaman Etesam Özge Nilay Yalçin Chuxuan Zhang Angelica Lim 19 2 0 30 Oct 2023
Guided Attention for Interpretable Motion Captioning Karim Radouane Andon Tchechmedjiev Sylvie Ranwez Julien Lagarde 19 1 0 11 Oct 2023
Propagating Semantic Labels in Video Data David Balaban Justin Medich Pranay Gosar Justin W. Hart VLM 12 1 0 01 Oct 2023
A Survey on Image-text Multimodal Models Ruifeng Guo Jingxuan Wei Linzhuang Sun Khai Le-Duc Guiyong Chang Dawei Liu Sibo Zhang Zhengbing Yao Mingjun Xu Liping Bu VLM 13 5 0 23 Sep 2023
CIEM: Contrastive Instruction Evaluation Method for Better Instruction Tuning Hongyu Hu Jiyuan Zhang Minyi Zhao Zhenbang Sun MLLM 12 20 0 05 Sep 2023
With a Little Help from your own Past: Prototypical Memory Networks for Image Captioning Manuele Barraco Sara Sarto Marcella Cornia Lorenzo Baraldi Rita Cucchiara VLM 43 10 0 23 Aug 2023
Diffusion Based Augmentation for Captioning and Retrieval in Cultural Heritage Dario Cioni Lorenzo Berlincioni Federico Becattini A. Bimbo DiffM 8 4 0 14 Aug 2023
GIT-Mol: A Multi-modal Large Language Model for Molecular Science with Graph, Image, and Text Peng Liu Yiming Ren Jun Tao Zhixiang Ren AI4CE 17 38 0 14 Aug 2023
A Comprehensive Analysis of Real-World Image Captioning and Scene Identification Sai Suprabhanu Nallapaneni Subrahmanyam Konakanchi 17 1 0 05 Aug 2023
Reverse Stable Diffusion: What prompt was used to generate this image? Florinel-Alin Croitoru Vlad Hondru Radu Tudor Ionescu M. Shah VLM DiffM 10 3 0 02 Aug 2023
BAGM: A Backdoor Attack for Manipulating Text-to-Image Generative Models J. Vice Naveed Akhtar Richard I. Hartley Ajmal Saeed Mian SILM DiffM 11 18 0 31 Jul 2023
LP-MusicCaps: LLM-Based Pseudo Music Captioning Seungheon Doh Keunwoo Choi Jongpil Lee Juhan Nam 11 43 0 31 Jul 2023
Synaptic Plasticity Models and Bio-Inspired Unsupervised Deep Learning: A Survey Gabriele Lagani Fabrizio Falchi Claudio Gennaro Giuseppe Amato AAML 11 3 0 30 Jul 2023
EnTri: Ensemble Learning with Tri-level Representations for Explainable Scene Recognition Amirhossein Aminimehr Amir Molaei Erik Cambria 15 1 0 23 Jul 2023
FigCaps-HF: A Figure-to-Caption Generative Framework and Benchmark with Human Feedback Ashish Singh Prateek R. Agarwal Zixuan Huang Arpita Singh Tong Yu Sungchul Kim Victor S. Bursztyn N. Vlassis Ryan A. Rossi 12 5 0 20 Jul 2023
TbExplain: A Text-based Explanation Method for Scene Classification Models with the Statistical Prediction Correction Amirhossein Aminimehr Pouya Khani Amir Molaei Amirmohammad Kazemeini Erik Cambria FAtt 9 5 0 19 Jul 2023
Embedded Heterogeneous Attention Transformer for Cross-lingual Image Captioning Zijie Song Zhenzhen Hu Yuanen Zhou Ye Zhao Richang Hong Meng Wang 16 2 0 19 Jul 2023
MMNet: Multi-Collaboration and Multi-Supervision Network for Sequential Deepfake Detection Ruiyang Xia Decheng Liu Jie Li Lin Yuan N. Wang Xinbo Gao 13 6 0 06 Jul 2023
Text + Sketch: Image Compression at Ultra Low Rates E. Lei Yiugit Berkay Uslu Hamed Hassani Shirin Saeedi Bidokhti DiffM 8 36 0 04 Jul 2023
Self-Supervised Image Captioning with CLIP Chuanyang Jin VLM SSL 21 1 0 26 Jun 2023
Sample-Efficient Learning of Novel Visual Concepts Sarthak Bhagat Simon Stepputtis Joseph Campbell Katia P. Sycara 27 4 0 15 Jun 2023
Embodied Executable Policy Learning with Language-based Scene Summarization Jielin Qiu Mengdi Xu William Jongwon Han Seungwhan Moon Ding Zhao LM&Ro 11 7 0 09 Jun 2023
"Let's not Quote out of Context": Unified Vision-Language Pretraining for Context Assisted Image Captioning Abisek Rajakumar Kalarani P. Bhattacharyya Niyati Chhaya Sumit Shekhar CoGe VLM 8 6 0 01 Jun 2023
Using Visual Cropping to Enhance Fine-Detail Question Answering of BLIP-Family Models Jiarui Zhang Mahyar Khayatkhoei P. Chhikara Filip Ilievski 19 1 0 31 May 2023
HAAV: Hierarchical Aggregation of Augmented Views for Image Captioning Chia-Wen Kuo Z. Kira 17 9 0 25 May 2023
PLIP: Language-Image Pre-training for Person Representation Learning Jia-li Zuo Jiahao Hong Feng Zhang Changqian Yu Hanyu Zhou Changxin Gao Nong Sang Jingdong Wang VLM MLLM 11 28 0 15 May 2023
A Survey on the Robustness of Computer Vision Models against Common Corruptions Shunxin Wang Raymond N. J. Veldhuis Christoph Brune N. Strisciuglio OOD VLM 13 11 0 10 May 2023
Multimodal Understanding Through Correlation Maximization and Minimization Yi Shi Marc Niethammer 22 0 0 04 May 2023
Cross-Domain Image Captioning with Discriminative Finetuning Roberto Dessì Michele Bevilacqua Eleonora Gualdoni Nathanaël Carraz Rakotonirina Francesca Franzon Marco Baroni CLIP 9 8 0 04 Apr 2023
Changes to Captions: An Attentive Network for Remote Sensing Change Captioning Shizhen Chang Pedram Ghamisi 14 43 0 03 Apr 2023
Cross-Modal Causal Intervention for Medical Report Generation Weixing Chen Yang Liu Ce Wang Jiarui Zhu Shen Zhao Guanbin Li Cheng-Lin Liu Liang Lin 9 1 0 16 Mar 2023
A Comprehensive Survey of AI-Generated Content (AIGC): A History of Generative AI from GAN to ChatGPT Yihan Cao Siyu Li Yixin Liu Zhiling Yan Yutong Dai Philip S. Yu Lichao Sun 11 332 0 07 Mar 2023
The Contribution of Knowledge in Visiolinguistic Learning: A Survey on Tasks and Challenges Maria Lymperaiou Giorgos Stamou VLM 18 4 0 04 Mar 2023
Guiding Pretraining in Reinforcement Learning with Large Language Models Yuqing Du Olivia Watkins Zihan Wang Cédric Colas Trevor Darrell Pieter Abbeel Abhishek Gupta Jacob Andreas LM&Ro 6 111 0 13 Feb 2023
Advances in Medical Image Analysis with Vision Transformers: A Comprehensive Review Reza Azad A. Kazerouni Moein Heidari Ehsan Khodapanah Aghdam Amir Molaei Yiwei Jia Abin Jose Rijo Roy Dorit Merhof MedIm ViT 11 98 0 09 Jan 2023
Using Large Language Models to Generate Engaging Captions for Data Visualizations A. Liew Klaus Mueller 13 7 0 27 Dec 2022
A survey on knowledge-enhanced multimodal learning Maria Lymperaiou Giorgos Stamou 28 6 0 19 Nov 2022
Artificial intelligence approaches for materials-by-design of energetic materials: state-of-the-art, challenges, and future directions Joseph B. Choi Phong C. H. Nguyen O. Sen H. Udaykumar Stephen Seung-Yeob Baek PINN AI4CE 6 6 0 15 Nov 2022
Novel 3D Scene Understanding Applications From Recurrence in a Single Image Shimian Zhang Skanda Bharadwaj Keaton Kraiger Yashasvi Asthana Hong Zhang R. Collins Yanxi Liu 23 0 0 14 Oct 2022
Affection: Learning Affective Explanations for Real-World Visual Data Panos Achlioptas M. Ovsjanikov Leonidas J. Guibas Sergey Tulyakov 43 10 0 04 Oct 2022
GSRFormer: Grounded Situation Recognition Transformer with Alternate Semantic Attention Refinement Zhi-Qi Cheng Qianwen Dai Siyao Li Teruko Mitamura Alexander G. Hauptmann 6 25 0 18 Aug 2022
The LAM Dataset: A Novel Benchmark for Line-Level Handwritten Text Recognition S. Cascianelli Vittorio Pippi Martin Maarand Marcella Cornia Lorenzo Baraldi Christopher Kermorvant Rita Cucchiara 6 6 0 16 Aug 2022
ALADIN: Distilling Fine-grained Alignment Scores for Efficient Image-Text Matching and Retrieval Nicola Messina Matteo Stefanini Marcella Cornia Lorenzo Baraldi Fabrizio Falchi Giuseppe Amato Rita Cucchiara VLM 9 14 0 29 Jul 2022
Are metrics measuring what they should? An evaluation of image captioning task metrics Othón González-Chávez Guillermo Ruiz Daniela Moctezuma Tania A. Ramirez-delreal 11 9 0 04 Jul 2022
SceneTrilogy: On Human Scene-Sketch and its Complementarity with Photo and Text Pinaki Nath Chowdhury A. Bhunia Aneeshan Sain Subhadeep Koley Tao Xiang Yi-Zhe Song 22 21 0 25 Apr 2022
Video Captioning: a comparative review of where we are and which could be the route Daniela Moctezuma Tania A. Ramirez-delreal Guillermo Ruiz Othón González-Chávez 6 8 0 12 Apr 2022
CaMEL: Mean Teacher Learning for Image Captioning Manuele Barraco Matteo Stefanini Marcella Cornia S. Cascianelli Lorenzo Baraldi Rita Cucchiara ViT VLM 17 26 0 21 Feb 2022
A Review of Emerging Research Directions in Abstract Visual Reasoning Mikolaj Malkiñski Jacek Mañdziuk 8 38 0 21 Feb 2022
A Frustratingly Simple Approach for End-to-End Image Captioning Ziyang Luo Yadong Xi Rongsheng Zhang Jing Ma VLM MLLM 15 16 0 30 Jan 2022
Generating More Pertinent Captions by Leveraging Semantics and Style on Multi-Source Datasets Marcella Cornia Lorenzo Baraldi G. Fiameni Rita Cucchiara 12 12 0 24 Nov 2021