ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2107.06912
  4. Cited By
From Show to Tell: A Survey on Deep Learning-based Image Captioning

From Show to Tell: A Survey on Deep Learning-based Image Captioning

14 July 2021
Matteo Stefanini
Marcella Cornia
Lorenzo Baraldi
S. Cascianelli
G. Fiameni
Rita Cucchiara
    3DV
    VLM
    MLLM
ArXivPDFHTML

Papers citing "From Show to Tell: A Survey on Deep Learning-based Image Captioning"

50 / 115 papers shown
Title
Emotional Theory of Mind: Bridging Fast Visual Processing with Slow
  Linguistic Reasoning
Emotional Theory of Mind: Bridging Fast Visual Processing with Slow Linguistic Reasoning
Yasaman Etesam
Özge Nilay Yalçin
Chuxuan Zhang
Angelica Lim
19
2
0
30 Oct 2023
Guided Attention for Interpretable Motion Captioning
Guided Attention for Interpretable Motion Captioning
Karim Radouane
Andon Tchechmedjiev
Sylvie Ranwez
Julien Lagarde
19
1
0
11 Oct 2023
Propagating Semantic Labels in Video Data
Propagating Semantic Labels in Video Data
David Balaban
Justin Medich
Pranay Gosar
Justin W. Hart
VLM
12
1
0
01 Oct 2023
A Survey on Image-text Multimodal Models
A Survey on Image-text Multimodal Models
Ruifeng Guo
Jingxuan Wei
Linzhuang Sun
Khai Le-Duc
Guiyong Chang
Dawei Liu
Sibo Zhang
Zhengbing Yao
Mingjun Xu
Liping Bu
VLM
13
5
0
23 Sep 2023
CIEM: Contrastive Instruction Evaluation Method for Better Instruction
  Tuning
CIEM: Contrastive Instruction Evaluation Method for Better Instruction Tuning
Hongyu Hu
Jiyuan Zhang
Minyi Zhao
Zhenbang Sun
MLLM
12
20
0
05 Sep 2023
With a Little Help from your own Past: Prototypical Memory Networks for
  Image Captioning
With a Little Help from your own Past: Prototypical Memory Networks for Image Captioning
Manuele Barraco
Sara Sarto
Marcella Cornia
Lorenzo Baraldi
Rita Cucchiara
VLM
43
10
0
23 Aug 2023
Diffusion Based Augmentation for Captioning and Retrieval in Cultural
  Heritage
Diffusion Based Augmentation for Captioning and Retrieval in Cultural Heritage
Dario Cioni
Lorenzo Berlincioni
Federico Becattini
A. Bimbo
DiffM
8
4
0
14 Aug 2023
GIT-Mol: A Multi-modal Large Language Model for Molecular Science with
  Graph, Image, and Text
GIT-Mol: A Multi-modal Large Language Model for Molecular Science with Graph, Image, and Text
Peng Liu
Yiming Ren
Jun Tao
Zhixiang Ren
AI4CE
17
38
0
14 Aug 2023
A Comprehensive Analysis of Real-World Image Captioning and Scene
  Identification
A Comprehensive Analysis of Real-World Image Captioning and Scene Identification
Sai Suprabhanu Nallapaneni
Subrahmanyam Konakanchi
17
1
0
05 Aug 2023
Reverse Stable Diffusion: What prompt was used to generate this image?
Reverse Stable Diffusion: What prompt was used to generate this image?
Florinel-Alin Croitoru
Vlad Hondru
Radu Tudor Ionescu
M. Shah
VLM
DiffM
10
3
0
02 Aug 2023
BAGM: A Backdoor Attack for Manipulating Text-to-Image Generative Models
BAGM: A Backdoor Attack for Manipulating Text-to-Image Generative Models
J. Vice
Naveed Akhtar
Richard I. Hartley
Ajmal Saeed Mian
SILM
DiffM
11
18
0
31 Jul 2023
LP-MusicCaps: LLM-Based Pseudo Music Captioning
LP-MusicCaps: LLM-Based Pseudo Music Captioning
Seungheon Doh
Keunwoo Choi
Jongpil Lee
Juhan Nam
11
43
0
31 Jul 2023
Synaptic Plasticity Models and Bio-Inspired Unsupervised Deep Learning:
  A Survey
Synaptic Plasticity Models and Bio-Inspired Unsupervised Deep Learning: A Survey
Gabriele Lagani
Fabrizio Falchi
Claudio Gennaro
Giuseppe Amato
AAML
11
3
0
30 Jul 2023
EnTri: Ensemble Learning with Tri-level Representations for Explainable
  Scene Recognition
EnTri: Ensemble Learning with Tri-level Representations for Explainable Scene Recognition
Amirhossein Aminimehr
Amir Molaei
Erik Cambria
15
1
0
23 Jul 2023
FigCaps-HF: A Figure-to-Caption Generative Framework and Benchmark with
  Human Feedback
FigCaps-HF: A Figure-to-Caption Generative Framework and Benchmark with Human Feedback
Ashish Singh
Prateek R. Agarwal
Zixuan Huang
Arpita Singh
Tong Yu
Sungchul Kim
Victor S. Bursztyn
N. Vlassis
Ryan A. Rossi
12
5
0
20 Jul 2023
TbExplain: A Text-based Explanation Method for Scene Classification
  Models with the Statistical Prediction Correction
TbExplain: A Text-based Explanation Method for Scene Classification Models with the Statistical Prediction Correction
Amirhossein Aminimehr
Pouya Khani
Amir Molaei
Amirmohammad Kazemeini
Erik Cambria
FAtt
9
5
0
19 Jul 2023
Embedded Heterogeneous Attention Transformer for Cross-lingual Image
  Captioning
Embedded Heterogeneous Attention Transformer for Cross-lingual Image Captioning
Zijie Song
Zhenzhen Hu
Yuanen Zhou
Ye Zhao
Richang Hong
Meng Wang
16
2
0
19 Jul 2023
MMNet: Multi-Collaboration and Multi-Supervision Network for Sequential
  Deepfake Detection
MMNet: Multi-Collaboration and Multi-Supervision Network for Sequential Deepfake Detection
Ruiyang Xia
Decheng Liu
Jie Li
Lin Yuan
N. Wang
Xinbo Gao
13
6
0
06 Jul 2023
Text + Sketch: Image Compression at Ultra Low Rates
Text + Sketch: Image Compression at Ultra Low Rates
E. Lei
Yiugit Berkay Uslu
Hamed Hassani
Shirin Saeedi Bidokhti
DiffM
8
36
0
04 Jul 2023
Self-Supervised Image Captioning with CLIP
Self-Supervised Image Captioning with CLIP
Chuanyang Jin
VLM
SSL
21
1
0
26 Jun 2023
Sample-Efficient Learning of Novel Visual Concepts
Sample-Efficient Learning of Novel Visual Concepts
Sarthak Bhagat
Simon Stepputtis
Joseph Campbell
Katia P. Sycara
27
4
0
15 Jun 2023
Embodied Executable Policy Learning with Language-based Scene
  Summarization
Embodied Executable Policy Learning with Language-based Scene Summarization
Jielin Qiu
Mengdi Xu
William Jongwon Han
Seungwhan Moon
Ding Zhao
LM&Ro
11
7
0
09 Jun 2023
"Let's not Quote out of Context": Unified Vision-Language Pretraining
  for Context Assisted Image Captioning
"Let's not Quote out of Context": Unified Vision-Language Pretraining for Context Assisted Image Captioning
Abisek Rajakumar Kalarani
P. Bhattacharyya
Niyati Chhaya
Sumit Shekhar
CoGe
VLM
8
6
0
01 Jun 2023
Using Visual Cropping to Enhance Fine-Detail Question Answering of
  BLIP-Family Models
Using Visual Cropping to Enhance Fine-Detail Question Answering of BLIP-Family Models
Jiarui Zhang
Mahyar Khayatkhoei
P. Chhikara
Filip Ilievski
19
1
0
31 May 2023
HAAV: Hierarchical Aggregation of Augmented Views for Image Captioning
HAAV: Hierarchical Aggregation of Augmented Views for Image Captioning
Chia-Wen Kuo
Z. Kira
17
9
0
25 May 2023
PLIP: Language-Image Pre-training for Person Representation Learning
PLIP: Language-Image Pre-training for Person Representation Learning
Jia-li Zuo
Jiahao Hong
Feng Zhang
Changqian Yu
Hanyu Zhou
Changxin Gao
Nong Sang
Jingdong Wang
VLM
MLLM
11
28
0
15 May 2023
A Survey on the Robustness of Computer Vision Models against Common
  Corruptions
A Survey on the Robustness of Computer Vision Models against Common Corruptions
Shunxin Wang
Raymond N. J. Veldhuis
Christoph Brune
N. Strisciuglio
OOD
VLM
13
11
0
10 May 2023
Multimodal Understanding Through Correlation Maximization and
  Minimization
Multimodal Understanding Through Correlation Maximization and Minimization
Yi Shi
Marc Niethammer
22
0
0
04 May 2023
Cross-Domain Image Captioning with Discriminative Finetuning
Cross-Domain Image Captioning with Discriminative Finetuning
Roberto Dessì
Michele Bevilacqua
Eleonora Gualdoni
Nathanaël Carraz Rakotonirina
Francesca Franzon
Marco Baroni
CLIP
9
8
0
04 Apr 2023
Changes to Captions: An Attentive Network for Remote Sensing Change
  Captioning
Changes to Captions: An Attentive Network for Remote Sensing Change Captioning
Shizhen Chang
Pedram Ghamisi
14
43
0
03 Apr 2023
Cross-Modal Causal Intervention for Medical Report Generation
Cross-Modal Causal Intervention for Medical Report Generation
Weixing Chen
Yang Liu
Ce Wang
Jiarui Zhu
Shen Zhao
Guanbin Li
Cheng-Lin Liu
Liang Lin
9
1
0
16 Mar 2023
A Comprehensive Survey of AI-Generated Content (AIGC): A History of
  Generative AI from GAN to ChatGPT
A Comprehensive Survey of AI-Generated Content (AIGC): A History of Generative AI from GAN to ChatGPT
Yihan Cao
Siyu Li
Yixin Liu
Zhiling Yan
Yutong Dai
Philip S. Yu
Lichao Sun
11
332
0
07 Mar 2023
The Contribution of Knowledge in Visiolinguistic Learning: A Survey on
  Tasks and Challenges
The Contribution of Knowledge in Visiolinguistic Learning: A Survey on Tasks and Challenges
Maria Lymperaiou
Giorgos Stamou
VLM
18
4
0
04 Mar 2023
Guiding Pretraining in Reinforcement Learning with Large Language Models
Guiding Pretraining in Reinforcement Learning with Large Language Models
Yuqing Du
Olivia Watkins
Zihan Wang
Cédric Colas
Trevor Darrell
Pieter Abbeel
Abhishek Gupta
Jacob Andreas
LM&Ro
6
111
0
13 Feb 2023
Advances in Medical Image Analysis with Vision Transformers: A
  Comprehensive Review
Advances in Medical Image Analysis with Vision Transformers: A Comprehensive Review
Reza Azad
A. Kazerouni
Moein Heidari
Ehsan Khodapanah Aghdam
Amir Molaei
Yiwei Jia
Abin Jose
Rijo Roy
Dorit Merhof
MedIm
ViT
11
98
0
09 Jan 2023
Using Large Language Models to Generate Engaging Captions for Data
  Visualizations
Using Large Language Models to Generate Engaging Captions for Data Visualizations
A. Liew
Klaus Mueller
13
7
0
27 Dec 2022
A survey on knowledge-enhanced multimodal learning
A survey on knowledge-enhanced multimodal learning
Maria Lymperaiou
Giorgos Stamou
28
6
0
19 Nov 2022
Artificial intelligence approaches for materials-by-design of energetic
  materials: state-of-the-art, challenges, and future directions
Artificial intelligence approaches for materials-by-design of energetic materials: state-of-the-art, challenges, and future directions
Joseph B. Choi
Phong C. H. Nguyen
O. Sen
H. Udaykumar
Stephen Seung-Yeob Baek
PINN
AI4CE
6
6
0
15 Nov 2022
Novel 3D Scene Understanding Applications From Recurrence in a Single
  Image
Novel 3D Scene Understanding Applications From Recurrence in a Single Image
Shimian Zhang
Skanda Bharadwaj
Keaton Kraiger
Yashasvi Asthana
Hong Zhang
R. Collins
Yanxi Liu
23
0
0
14 Oct 2022
Affection: Learning Affective Explanations for Real-World Visual Data
Affection: Learning Affective Explanations for Real-World Visual Data
Panos Achlioptas
M. Ovsjanikov
Leonidas J. Guibas
Sergey Tulyakov
43
10
0
04 Oct 2022
GSRFormer: Grounded Situation Recognition Transformer with Alternate
  Semantic Attention Refinement
GSRFormer: Grounded Situation Recognition Transformer with Alternate Semantic Attention Refinement
Zhi-Qi Cheng
Qianwen Dai
Siyao Li
Teruko Mitamura
Alexander G. Hauptmann
6
25
0
18 Aug 2022
The LAM Dataset: A Novel Benchmark for Line-Level Handwritten Text
  Recognition
The LAM Dataset: A Novel Benchmark for Line-Level Handwritten Text Recognition
S. Cascianelli
Vittorio Pippi
Martin Maarand
Marcella Cornia
Lorenzo Baraldi
Christopher Kermorvant
Rita Cucchiara
6
6
0
16 Aug 2022
ALADIN: Distilling Fine-grained Alignment Scores for Efficient
  Image-Text Matching and Retrieval
ALADIN: Distilling Fine-grained Alignment Scores for Efficient Image-Text Matching and Retrieval
Nicola Messina
Matteo Stefanini
Marcella Cornia
Lorenzo Baraldi
Fabrizio Falchi
Giuseppe Amato
Rita Cucchiara
VLM
9
14
0
29 Jul 2022
Are metrics measuring what they should? An evaluation of image
  captioning task metrics
Are metrics measuring what they should? An evaluation of image captioning task metrics
Othón González-Chávez
Guillermo Ruiz
Daniela Moctezuma
Tania A. Ramirez-delreal
11
9
0
04 Jul 2022
SceneTrilogy: On Human Scene-Sketch and its Complementarity with Photo
  and Text
SceneTrilogy: On Human Scene-Sketch and its Complementarity with Photo and Text
Pinaki Nath Chowdhury
A. Bhunia
Aneeshan Sain
Subhadeep Koley
Tao Xiang
Yi-Zhe Song
22
21
0
25 Apr 2022
Video Captioning: a comparative review of where we are and which could
  be the route
Video Captioning: a comparative review of where we are and which could be the route
Daniela Moctezuma
Tania A. Ramirez-delreal
Guillermo Ruiz
Othón González-Chávez
6
8
0
12 Apr 2022
CaMEL: Mean Teacher Learning for Image Captioning
CaMEL: Mean Teacher Learning for Image Captioning
Manuele Barraco
Matteo Stefanini
Marcella Cornia
S. Cascianelli
Lorenzo Baraldi
Rita Cucchiara
ViT
VLM
17
26
0
21 Feb 2022
A Review of Emerging Research Directions in Abstract Visual Reasoning
A Review of Emerging Research Directions in Abstract Visual Reasoning
Mikolaj Malkiñski
Jacek Mañdziuk
8
38
0
21 Feb 2022
A Frustratingly Simple Approach for End-to-End Image Captioning
A Frustratingly Simple Approach for End-to-End Image Captioning
Ziyang Luo
Yadong Xi
Rongsheng Zhang
Jing Ma
VLM
MLLM
15
16
0
30 Jan 2022
Generating More Pertinent Captions by Leveraging Semantics and Style on
  Multi-Source Datasets
Generating More Pertinent Captions by Leveraging Semantics and Style on Multi-Source Datasets
Marcella Cornia
Lorenzo Baraldi
G. Fiameni
Rita Cucchiara
12
12
0
24 Nov 2021
Previous
123
Next