Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2107.06912
Cited By
From Show to Tell: A Survey on Deep Learning-based Image Captioning
14 July 2021
Matteo Stefanini
Marcella Cornia
Lorenzo Baraldi
S. Cascianelli
G. Fiameni
Rita Cucchiara
3DV
VLM
MLLM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"From Show to Tell: A Survey on Deep Learning-based Image Captioning"
50 / 115 papers shown
Title
Group-based Distinctive Image Captioning with Memory Difference Encoding and Attention
Jiuniu Wang
Wenjia Xu
Qingzhong Wang
Antoni B. Chan
26
0
0
03 Apr 2025
Semantic-Spatial Feature Fusion with Dynamic Graph Refinement for Remote Sensing Image Captioning
Maofu Liu
Jiahui Liu
Xiaokang Zhang
27
0
0
30 Mar 2025
A Survey of Efficient Reasoning for Large Reasoning Models: Language, Multimodality, and Beyond
Xiaoye Qu
Yafu Li
Zhaochen Su
Weigao Sun
Jianhao Yan
...
Chaochao Lu
Yue Zhang
Xian-Sheng Hua
Bowen Zhou
Yu Cheng
ReLM
OffRL
LRM
76
11
0
27 Mar 2025
Image Captioning Evaluation in the Age of Multimodal LLMs: Challenges and Future Perspectives
Sara Sarto
Marcella Cornia
Rita Cucchiara
36
0
0
18 Mar 2025
Open3DVQA: A Benchmark for Comprehensive Spatial Reasoning with Multimodal Large Language Model in Open Space
Weichen Zhan
Zile Zhou
Zhiheng Zheng
Chen Gao
Jinqiang Cui
Y. Li
Xinlei Chen
Xiao-Ping Zhang
LRM
57
1
0
14 Mar 2025
SySLLM: Generating Synthesized Policy Summaries for Reinforcement Learning Agents Using Large Language Models
Sahar Admoni
Omer Ben-Porat
Ofra Amir
LLMAG
39
0
0
13 Mar 2025
SuperCap: Multi-resolution Superpixel-based Image Captioning
Henry Senior
Luca Rossi
Gregory Slabaugh
Shanxin Yuan
VLM
58
0
0
11 Mar 2025
Enhancing Abnormality Grounding for Vision Language Models with Knowledge Descriptions
Jun Yu Li
Che Liu
Wenjia Bai
Rossella Arcucci
Cosmin I. Bercea
Julia A. Schnabel
34
0
0
05 Mar 2025
ImageChain: Advancing Sequential Image-to-Text Reasoning in Multimodal Large Language Models
Danae Sánchez Villegas
Ingo Ziegler
Desmond Elliott
LRM
35
1
0
26 Feb 2025
Omnidirectional Image Quality Captioning: A Large-scale Database and A New Model
Jiebin Yan
Ziwen Tan
Yuming Fang
Junjie Chen
Wenhui Jiang
Zhou Wang
59
2
0
24 Feb 2025
Multi-Branch Collaborative Learning Network for Video Quality Assessment in Industrial Video Search
Hengzhu Tang
Zefeng Zhang
Zhiping Li
Zhenyu Zhang
Xing Wu
Li Gao
Suqi Cheng
Dawei Yin
49
1
0
09 Feb 2025
An Ensemble Model with Attention Based Mechanism for Image Captioning
Israa Al Badarneh
Bassam Hammo
Omar Al-Kadi
40
2
0
28 Jan 2025
Generalized Task-Driven Medical Image Quality Enhancement with Gradient Promotion
Dong Zhang
Kwang-Ting Cheng
MedIm
20
0
0
03 Jan 2025
ErgoChat: a Visual Query System for the Ergonomic Risk Assessment of Construction Workers
Chao Fan
Qipei Mei
Xiaonan Wang
Xinming Li
28
3
0
31 Dec 2024
Natural Language Understanding and Inference with MLLM in Visual Question Answering: A Survey
Jiayi Kuang
Jingyou Xie
Haohao Luo
Ronghao Li
Zhe Xu
Xianfeng Cheng
Yinghui Li
Xika Lin
Ying Shen
LRM
77
2
0
26 Nov 2024
LaB-RAG: Label Boosted Retrieval Augmented Generation for Radiology Report Generation
Steven Song
Anirudh Subramanyam
Irene Madejski
Robert L. Grossman
MedIm
VLM
90
0
0
25 Nov 2024
An Efficient System for Automatic Map Storytelling -- A Case Study on Historical Maps
Ziyi Liu
Claudio Affolter
Sidi Wu
Yizi Chen
L. Hurni
11
0
0
21 Oct 2024
Hiding-in-Plain-Sight (HiPS) Attack on CLIP for Targetted Object Removal from Images
Arka Daw
Megan Hong-Thanh Chung
Maria Mahbub
Amir Sadovnik
AAML
14
0
0
16 Oct 2024
Positive-Augmented Contrastive Learning for Vision-and-Language Evaluation and Training
Sara Sarto
Nicholas Moratelli
Marcella Cornia
Lorenzo Baraldi
Rita Cucchiara
18
0
0
09 Oct 2024
DENEB: A Hallucination-Robust Automatic Evaluation Metric for Image Captioning
Kazuki Matsuda
Yuiga Wada
Komei Sugiura
16
0
0
28 Sep 2024
No Detail Left Behind: Revisiting Self-Retrieval for Fine-Grained Image Captioning
Manu Gaur
Darshan Singh
Makarand Tapaswi
28
1
0
04 Sep 2024
See or Guess: Counterfactually Regularized Image Captioning
Qian Cao
Xu Chen
Ruihua Song
Xiting Wang
Xinting Huang
Yuchen Ren
CML
24
0
0
29 Aug 2024
Pixels to Prose: Understanding the art of Image Captioning
Hrishikesh Singh
Aarti Sharma
Millie Pant
3DV
VLM
22
0
0
28 Aug 2024
Revisiting Image Captioning Training Paradigm via Direct CLIP-based Optimization
Nicholas Moratelli
Davide Caffagni
Marcella Cornia
Lorenzo Baraldi
Rita Cucchiara
CLIP
16
1
0
26 Aug 2024
Surveying the Landscape of Image Captioning Evaluation: A Comprehensive Taxonomy, Trends and Metrics Analysis
Uri Berger
Gabriel Stanovsky
Omri Abend
Lea Frermann
16
0
0
09 Aug 2024
UNMuTe: Unifying Navigation and Multimodal Dialogue-like Text Generation
Niyati Rawal
Roberto Bigazzi
Lorenzo Baraldi
Rita Cucchiara
LM&Ro
21
0
0
08 Aug 2024
VolDoGer: LLM-assisted Datasets for Domain Generalization in Vision-Language Tasks
Juhwan Choi
Junehyoung Kwon
Jungmin Yun
Seunguk Yu
Youngbin Kim
23
0
0
29 Jul 2024
Nearest Neighbor Future Captioning: Generating Descriptions for Possible Collisions in Object Placement Tasks
Takumi Komatsu
Motonari Kambara
Shumpei Hatanaka
Haruka Matsuo
Tsubasa Hirakawa
Takayoshi Yamashita
H. Fujiyoshi
Komei Sugiura
27
0
0
18 Jul 2024
NovoBench: Benchmarking Deep Learning-based De Novo Peptide Sequencing Methods in Proteomics
Jingbo Zhou
Shaorong Chen
Jun-Xiong Xia
Sizhe Liu
Tianze Ling
Wenjie Du
Yue Liu
Jianwei Yin
Stan Z. Li
19
0
0
16 Jun 2024
Improving Large Models with Small models: Lower Costs and Better Performance
Dong Chen
Shuo Zhang
Yueting Zhuang
Siliang Tang
Qidong Liu
Hua Wang
Mingliang Xu
24
1
0
15 Jun 2024
Towards Retrieval-Augmented Architectures for Image Captioning
Sara Sarto
Marcella Cornia
Lorenzo Baraldi
Alessandro Nicolosi
Rita Cucchiara
VLM
17
1
0
21 May 2024
Mozart's Touch: A Lightweight Multi-modal Music Generation Framework Based on Pre-Trained Large Models
Tianze Xu
Jiajun Li
Xuesong Chen
Xinrui Yao
Shuchang Liu
17
4
0
05 May 2024
AIGeN: An Adversarial Approach for Instruction Generation in VLN
Niyati Rawal
Roberto Bigazzi
Lorenzo Baraldi
Rita Cucchiara
GAN
34
4
0
15 Apr 2024
CLIPping the Limits: Finding the Sweet Spot for Relevant Images in Automated Driving Systems Perception Testing
Philipp Rigoll
Laurenz Adolph
Lennart Ries
Eric Sax
24
0
0
08 Apr 2024
Correcting misinformation on social media with a large language model
Xinyi Zhou
Ashish Sharma
Amy X. Zhang
Tim Althoff
KELM
28
0
0
17 Mar 2024
Large Model driven Radiology Report Generation with Clinical Quality Reinforcement Learning
Zijian Zhou
Miaojing Shi
Meng Wei
Oluwatosin O. Alabi
Zijie Yue
Tom Kamiel Magda Vercauteren
LM&MA
23
3
0
11 Mar 2024
VIXEN: Visual Text Comparison Network for Image Difference Captioning
Alexander Black
Jing Shi
Yifei Fai
Tu Bui
John Collomosse
34
3
0
29 Feb 2024
Polos: Multimodal Metric Learning from Human Feedback for Image Captioning
Yuiga Wada
Kanta Kaneda
Daichi Saito
Komei Sugiura
14
5
0
28 Feb 2024
AICAttack: Adversarial Image Captioning Attack with Attention-Based Optimization
Jiyao Li
Mingze Ni
Yifei Dong
Tianqing Zhu
Wei Liu
AAML
27
1
0
19 Feb 2024
KTVIC: A Vietnamese Image Captioning Dataset on the Life Domain
Anh-Cuong Pham
Van-Quang Nguyen
Thi-Hong Vuong
Quang-Thuy Ha
19
0
0
16 Jan 2024
Jewelry Recognition via Encoder-Decoder Models
José M. Alcalde-Llergo
Enrique Yeguas-Bolivar
Andrea Zingoni
Alejandro Fuerte-Jurado
19
0
0
15 Jan 2024
Improving Cross-modal Alignment with Synthetic Pairs for Text-only Image Captioning
Zhiyue Liu
Jinyuan Liu
Fanrong Ma
CLIP
VLM
19
2
0
14 Dec 2023
Negative Pre-aware for Noisy Cross-modal Matching
Xu-Yao Zhang
Hao Li
Mang Ye
14
1
0
10 Dec 2023
User-Aware Prefix-Tuning is a Good Learner for Personalized Image Captioning
Xuan Wang
Guanhong Wang
Wenhao Chai
Jiayu Zhou
Gaoang Wang
25
2
0
08 Dec 2023
Mitigating Open-Vocabulary Caption Hallucinations
Assaf Ben-Kish
Moran Yanuka
Morris Alper
Raja Giryes
Hadar Averbuch-Elor
MLLM
VLM
6
2
0
06 Dec 2023
Segment and Caption Anything
Xiaoke Huang
Jianfeng Wang
Yansong Tang
Zheng Zhang
Han Hu
Jiwen Lu
Lijuan Wang
Zicheng Liu
MLLM
VLM
16
13
0
01 Dec 2023
4D-fy: Text-to-4D Generation Using Hybrid Score Distillation Sampling
Sherwin Bahmani
Ivan Skorokhodov
Victor Rong
Gordon Wetzstein
Leonidas J. Guibas
Peter Wonka
Sergey Tulyakov
Jeong Joon Park
Andrea Tagliasacchi
David B. Lindell
DiffM
18
48
0
29 Nov 2023
Zero-shot Referring Expression Comprehension via Structural Similarity Between Images and Captions
Zeyu Han
Fangrui Zhu
Qianru Lao
Huaizu Jiang
ObjD
6
5
0
28 Nov 2023
Violet: A Vision-Language Model for Arabic Image Captioning with Gemini Decoder
Abdelrahman Mohamed
Fakhraddin Alwajih
El Moatez Billah Nagoudi
Alcides Alcoba Inciarte
Muhammad Abdul-Mageed
VLM
MLLM
15
5
0
15 Nov 2023
JaSPICE: Automatic Evaluation Metric Using Predicate-Argument Structures for Image Captioning Models
Yuiga Wada
Kanta Kaneda
Komei Sugiura
15
4
0
07 Nov 2023
1
2
3
Next