Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1810.04020
Cited By
A Comprehensive Survey of Deep Learning for Image Captioning
6 October 2018
Md. Zakir Hossain
Ferdous Sohel
M. Shiratuddin
Hamid Laga
VLM
3DV
Re-assign community
ArXiv
PDF
HTML
Papers citing
"A Comprehensive Survey of Deep Learning for Image Captioning"
50 / 228 papers shown
Title
Vision Mamba in Remote Sensing: A Comprehensive Survey of Techniques, Applications and Outlook
Muyi Bao
Shuchang Lyu
Zhaoyang Xu
Huiyu Zhou
Jinchang Ren
Shiming Xiang
X. Li
Guangliang Cheng
Mamba
77
0
0
01 May 2025
DART: Disease-aware Image-Text Alignment and Self-correcting Re-alignment for Trustworthy Radiology Report Generation
Sang-Jun Park
Keun-Soo Heo
Dong-Hee Shin
Young-Han Son
Ji-Hye Oh
Tae-Eui Kam
MedIm
34
0
0
16 Apr 2025
Building Trustworthy Multimodal AI: A Review of Fairness, Transparency, and Ethics in Vision-Language Tasks
Mohammad Saleha
Azadeh Tabatabaeib
52
0
0
14 Apr 2025
MicroNN: An On-device Disk-resident Updatable Vector Database
Jeffrey Pound
Floris Chabert
Arjun Bhushan
Ankur Goswami
Anil Pacaci
S. R. Chowdhury
24
0
0
08 Apr 2025
ImageSet2Text: Describing Sets of Images through Text
Piera Riccio
F. Galati
Kajetan Schweighofer
Noa Garcia
Nuria Oliver
VLM
CoGe
72
0
0
25 Mar 2025
A Survey on Mathematical Reasoning and Optimization with Large Language Models
Ali Forootani
OffRL
LRM
AI4CE
40
0
0
22 Mar 2025
Integrating Frequency-Domain Representations with Low-Rank Adaptation in Vision-Language Models
Md Azim Khan
A. Gangopadhyay
Jianwu Wang
Robert F. Erbacher
VLM
52
0
0
08 Mar 2025
DoraCycle: Domain-Oriented Adaptation of Unified Generative Model in Multimodal Cycles
Rui Zhao
Weijia Mao
Mike Zheng Shou
64
0
0
05 Mar 2025
AC-Lite : A Lightweight Image Captioning Model for Low-Resource Assamese Language
Pankaj Choudhury
Yogesh Aggarwal
Prabhanjan Jadhav
Prithwijit Guha
Sukumar Nandi
74
0
0
03 Mar 2025
Octopus: Alleviating Hallucination via Dynamic Contrastive Decoding
Wei Suo
Lijun Zhang
Mengyang Sun
Lin Yuanbo Wu
Peng Wang
Y. Zhang
MLLM
VLM
47
1
0
01 Mar 2025
Beyond RNNs: Benchmarking Attention-Based Image Captioning Models
Hemanth Teja Yanambakkam
Rahul Chinthala
33
0
0
26 Feb 2025
A Comprehensive Survey on Image Signal Processing Approaches for Low-Illumination Image Enhancement
Muhammad Turab
53
0
0
09 Feb 2025
An Ensemble Model with Attention Based Mechanism for Image Captioning
Israa Al Badarneh
Bassam Hammo
Omar Al-Kadi
45
3
0
28 Jan 2025
Mathematical Language Models: A Survey
W. Liu
Hanglei Hu
Jie Zhou
Yuyang Ding
Junsong Li
...
Mengliang He
Qin Chen
Bo Jiang
Aimin Zhou
Liang He
LRM
79
12
0
03 Jan 2025
MOSABench: Multi-Object Sentiment Analysis Benchmark for Evaluating Multimodal Large Language Models Understanding of Complex Image
Shezheng Song
Chengxiang He
Shasha Li
Shan Zhao
Chengyu Wang
...
Xiaopeng Li
Qian Wan
Jun Ma
Jie Yu
Xiaoguang Mao
VLM
82
1
0
25 Nov 2024
Exploring Large Language Models for Multimodal Sentiment Analysis: Challenges, Benchmarks, and Future Directions
Shezheng Song
61
0
0
23 Nov 2024
Incremental IVF Index Maintenance for Streaming Vector Search
J. Mohoney
Anil Pacaci
S. R. Chowdhury
U. F. Minhas
Jeffery Pound
Cédric Renggli
Nima Reyhani
Ihab F. Ilyas
Theodoros Rekatsinas
Shivaram Venkataraman
26
1
0
01 Nov 2024
Self-Comparison for Dataset-Level Membership Inference in Large (Vision-)Language Models
J. Ren
Kangrui Chen
Chen Chen
Vikash Sehwag
Yue Xing
Jiliang Tang
Lingjuan Lyu
24
1
0
16 Oct 2024
NaVIP: An Image-Centric Indoor Navigation Solution for Visually Impaired People
Jun Yu
Yifan Zhang
Badrinadh Aila
V. Namboodiri
28
1
0
08 Oct 2024
EMMA: Empowering Multi-modal Mamba with Structural and Hierarchical Alignment
Yifei Xing
Xiangyuan Lan
Ruiping Wang
D. Jiang
Wenjun Huang
Qingfang Zheng
Yaowei Wang
Mamba
33
0
0
08 Oct 2024
DENEB: A Hallucination-Robust Automatic Evaluation Metric for Image Captioning
Kazuki Matsuda
Yuiga Wada
Komei Sugiura
26
1
0
28 Sep 2024
Text2Traj2Text: Learning-by-Synthesis Framework for Contextual Captioning of Human Movement Trajectories
Hikaru Asano
Ryo Yonetani
Taiki Sekii
Hiroki Ouchi
67
0
0
19 Sep 2024
TropNNC: Structured Neural Network Compression Using Tropical Geometry
Konstantinos Fotopoulos
Petros Maragos
Panagiotis Misiakos
16
0
0
05 Sep 2024
Coalitions of AI-based Methods Predict 15-Year Risks of Breast Cancer Metastasis Using Real-World Clinical Data with AUC up to 0.9
Xia Jiang
Yijun Zhou
Alan Wells
A. Brufsky
OOD
AI4CE
23
0
0
29 Aug 2024
Pixels to Prose: Understanding the art of Image Captioning
Hrishikesh Singh
Aarti Sharma
Millie Pant
3DV
VLM
25
0
0
28 Aug 2024
Bi-directional Contextual Attention for 3D Dense Captioning
Minjung Kim
Hyung Suk Lim
Soonyoung Lee
Bumsoo Kim
Gunhee Kim
35
3
0
13 Aug 2024
Surveying the Landscape of Image Captioning Evaluation: A Comprehensive Taxonomy, Trends and Metrics Analysis
Uri Berger
Gabriel Stanovsky
Omri Abend
Lea Frermann
27
0
0
09 Aug 2024
Dual-path Collaborative Generation Network for Emotional Video Captioning
Cheng Ye
Weidong Chen
Jingyu Li
L. Zhang
Zhendong Mao
82
1
0
06 Aug 2024
The Phantom Menace: Unmasking Privacy Leakages in Vision-Language Models
Simone Caldarella
Massimiliano Mancini
Elisa Ricci
Rahaf Aljundi
PILM
37
1
0
02 Aug 2024
Continual Panoptic Perception: Towards Multi-modal Incremental Interpretation of Remote Sensing Images
Bo Yuan
Danpei Zhao
Zhuoran Liu
Wentao Li
Tian Li
CLL
VLM
28
2
0
19 Jul 2024
BadRobot: Jailbreaking Embodied LLMs in the Physical World
Hangtao Zhang
Chenyu Zhu
Xianlong Wang
Ziqi Zhou
Yichen Wang
...
Shengshan Hu
Leo Yu Zhang
Aishan Liu
Peijin Guo
Leo Yu Zhang
LM&Ro
42
7
0
16 Jul 2024
Graph Transformers: A Survey
Ahsan Shehzad
Feng Xia
Shagufta Abid
Ciyuan Peng
Shuo Yu
Dongyu Zhang
Karin Verspoor
AI4CE
29
9
0
13 Jul 2024
Unexplainability of Artificial Intelligence Judgments in Kant's Perspective
Jongwoo Seo
26
0
0
12 Jul 2024
Semantic GUI Scene Learning and Video Alignment for Detecting Duplicate Video-based Bug Reports
Yanfu Yan
Nathan Cooper
Oscar Chaparro
Kevin Moran
Denys Poshyvanyk
35
5
0
11 Jul 2024
Enhancing Multimodal Large Language Models with Multi-instance Visual Prompt Generator for Visual Representation Enrichment
Wenliang Zhong
Wenyi Wu
Qi Li
Rob Barton
Boxin Du
Shioulin Sam
Karim Bouyarmane
Ismail B. Tutar
Junzhou Huang
25
3
0
05 Jun 2024
Dreamguider: Improved Training free Diffusion-based Conditional Generation
Nithin Gopalakrishnan Nair
Vishal M. Patel
30
2
0
04 Jun 2024
TabPedia: Towards Comprehensive Visual Table Understanding with Concept Synergy
Weichao Zhao
Hao Feng
Qi Liu
Jingqun Tang
Shubo Wei
...
Lei Liao
Yongjie Ye
Hao Liu
Houqiang Li
Can Huang
LMTD
28
17
0
03 Jun 2024
Scene Graph Generation Strategy with Co-occurrence Knowledge and Learnable Term Frequency
H. Kim
Sangwon Kim
Dasom Ahn
Jong Taek Lee
ByoungChul Ko
40
2
0
21 May 2024
Referring Flexible Image Restoration
Runwei Guan
Rongsheng Hu
Zhuhao Zhou
Tianlang Xue
Ka Lok Man
Jeremy S. Smith
Eng Gee Lim
Weiping Ding
Yutao Yue
32
0
0
16 Apr 2024
AesExpert: Towards Multi-modality Foundation Model for Image Aesthetics Perception
Yipo Huang
Xiangfei Sheng
Zhichao Yang
Quan Yuan
Zhichao Duan
Pengfei Chen
Leida Li
Weisi Lin
Guangming Shi
34
23
0
15 Apr 2024
Semi-Supervised Image Captioning Considering Wasserstein Graph Matching
Yang Yang
36
0
0
26 Mar 2024
Parameter-Efficient Fine-Tuning for Large Models: A Comprehensive Survey
Zeyu Han
Chao Gao
Jinyang Liu
Jeff Zhang
Sai Qian Zhang
139
308
0
21 Mar 2024
A Comprehensive Survey of 3D Dense Captioning: Localizing and Describing Objects in 3D Scenes
Ting Yu
Xiaojun Lin
Shuhui Wang
Weiguo Sheng
Qingming Huang
Jun-chen Yu
3DV
40
10
0
12 Mar 2024
VL-Trojan: Multimodal Instruction Backdoor Attacks against Autoregressive Visual Language Models
Jiawei Liang
Siyuan Liang
Man Luo
Aishan Liu
Dongchen Han
Ee-Chien Chang
Xiaochun Cao
38
37
0
21 Feb 2024
Transfer Learning in Human Activity Recognition: A Survey
Sourish Gunesh Dhekane
Thomas Ploetz
MU
AI4TS
25
34
0
18 Jan 2024
AesBench: An Expert Benchmark for Multimodal Large Language Models on Image Aesthetics Perception
Yipo Huang
Quan Yuan
Xiangfei Sheng
Zhichao Yang
Haoning Wu
Pengfei Chen
Yuzhe Yang
Leida Li
Weisi Lin
VLM
19
37
0
16 Jan 2024
Safe Reinforcement Learning with Free-form Natural Language Constraints and Pre-Trained Language Models
Xingzhou Lou
Junge Zhang
Ziyan Wang
Kaiqi Huang
Yali Du
30
3
0
15 Jan 2024
Jack of All Tasks, Master of Many: Designing General-purpose Coarse-to-Fine Vision-Language Model
Shraman Pramanick
Guangxing Han
Rui Hou
Sayan Nag
Ser-Nam Lim
Nicolas Ballas
Qifan Wang
Rama Chellappa
Amjad Almahairi
VLM
MLLM
38
29
0
19 Dec 2023
Assessing GPT4-V on Structured Reasoning Tasks
Mukul Singh
J. Cambronero
Sumit Gulwani
Vu Le
Gust Verbruggen
LRM
35
10
0
13 Dec 2023
User-Aware Prefix-Tuning is a Good Learner for Personalized Image Captioning
Xuan Wang
Guanhong Wang
Wenhao Chai
Jiayu Zhou
Gaoang Wang
27
4
0
08 Dec 2023
1
2
3
4
5
Next