Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1810.04020
Cited By
A Comprehensive Survey of Deep Learning for Image Captioning
6 October 2018
Md. Zakir Hossain
Ferdous Sohel
M. Shiratuddin
Hamid Laga
VLM
3DV
Re-assign community
ArXiv
PDF
HTML
Papers citing
"A Comprehensive Survey of Deep Learning for Image Captioning"
50 / 228 papers shown
Title
Do DALL-E and Flamingo Understand Each Other?
Hang Li
Jindong Gu
Rajat Koner
Sahand Sharifzadeh
Volker Tresp
MLLM
16
12
0
23 Dec 2022
Towards Generating Diverse Audio Captions via Adversarial Training
Xinhao Mei
Xubo Liu
Jianyuan Sun
Mark D. Plumbley
Wenwu Wang
DiffM
33
2
0
05 Dec 2022
PLA: Language-Driven Open-Vocabulary 3D Scene Understanding
Runyu Ding
Jihan Yang
Chuhui Xue
Wenqing Zhang
Song Bai
Xiaojuan Qi
VLM
15
146
0
29 Nov 2022
Deep representation learning: Fundamentals, Perspectives, Applications, and Open Challenges
K. T. Baghaei
Amirreza Payandeh
Pooya Fayyazsanavi
Shahram Rahimi
Zhiqian Chen
Somayeh Bakhtiari Ramezani
FaML
AI4TS
30
6
0
27 Nov 2022
Aesthetically Relevant Image Captioning
Zhipeng Zhong
Fei Zhou
Guoping Qiu
31
9
0
25 Nov 2022
Feedback is Needed for Retakes: An Explainable Poor Image Notification Framework for the Visually Impaired
Kazuya Ohata
Shunsuke Kitada
Hitoshi Iyatomi
14
0
0
17 Nov 2022
CapEnrich: Enriching Caption Semantics for Web Images via Cross-modal Pre-trained Knowledge
Linli Yao
Wei-Neng Chen
Qin Jin
VLM
22
10
0
17 Nov 2022
Vis2Mus: Exploring Multimodal Representation Mapping for Controllable Music Generation
Runbang Zhang
Yixiao Zhang
Kai Shao
Ying Shan
Gus Xia
21
4
0
10 Nov 2022
CLSE: Corpus of Linguistically Significant Entities
A. Chuklin
Justin Zhao
Mihir Kale
13
1
0
04 Nov 2022
Physical Adversarial Attack meets Computer Vision: A Decade Survey
Hui Wei
Hao Tang
Xuemei Jia
Zhixiang Wang
Han-Bing Yu
Zhubo Li
Shiníchi Satoh
Luc Van Gool
Zheng Wang
AAML
27
43
0
30 Sep 2022
M^4I: Multi-modal Models Membership Inference
Pingyi Hu
Zihan Wang
Ruoxi Sun
Hu Wang
Minhui Xue
37
26
0
15 Sep 2022
Cross Modal Compression: Towards Human-comprehensible Semantic Compression
Jiguo Li
Chuanmin Jia
Xinfeng Zhang
Siwei Ma
Wen Gao
9
18
0
06 Sep 2022
Facial Expression Recognition and Image Description Generation in Vietnamese
Khang Nhut Lam
Kim Thi-Thanh Nguyen
Loc Huu Nguy
Jugal Kalita
3DH
CVBM
15
1
0
12 Aug 2022
A Comprehensive Survey of Natural Language Generation Advances from the Perspective of Digital Deception
Keenan I. Jones
Enes ALTUNCU
V. N. Franqueira
Yi-Chia Wang
Shujun Li
DeLMO
34
3
0
11 Aug 2022
End-to-end deep learning for directly estimating grape yield from ground-based imagery
A. Olenskyj
B. Sams
Zhenghao Fei
Vishal Singh
P. Raja
G. Bornhorst
J. M. Earles
26
28
0
04 Aug 2022
Visual Recognition by Request
Chufeng Tang
Lingxi Xie
Xiaopeng Zhang
Xiaolin Hu
Qi Tian
VLM
16
15
0
28 Jul 2022
Controllable Data Generation by Deep Learning: A Review
Shiyu Wang
Yuanqi Du
Xiaojie Guo
Bo Pan
Zhaohui Qin
Liang Zhao
29
28
0
19 Jul 2022
Relational Future Captioning Model for Explaining Likely Collisions in Daily Tasks
Motonari Kambara
K. Sugiura
17
6
0
19 Jul 2022
Exploring Adversarial Examples and Adversarial Robustness of Convolutional Neural Networks by Mutual Information
Jiebao Zhang
Wenhua Qian
Ren-qi Nie
Jinde Cao
Dan Xu
GAN
AAML
17
0
0
12 Jul 2022
Vision-and-Language Pretraining
Thong Nguyen
Cong-Duy Nguyen
Xiaobao Wu
See-Kiong Ng
A. Luu
VLM
CLIP
19
2
0
05 Jul 2022
Gender Artifacts in Visual Datasets
Nicole Meister
Dora Zhao
Angelina Wang
V. V. Ramaswamy
Ruth C. Fong
Olga Russakovsky
21
28
0
18 Jun 2022
Image Captioning based on Feature Refinement and Reflective Decoding
G. Alabduljabbar
Hafida Benhidour
Said Kerrache
3DV
14
3
0
16 Jun 2022
Video-based Human-Object Interaction Detection from Tubelet Tokens
Danyang Tu
Wei Sun
Xiongkuo Min
Guangtao Zhai
Wei Shen
ViT
13
15
0
04 Jun 2022
A Generative Adversarial Network-based Selective Ensemble Characteristic-to-Expression Synthesis (SE-CTES) Approach and Its Applications in Healthcare
Yuxuan Li
Ying-Jia Lin
Chenang Liu
23
0
0
29 May 2022
Prompt-based Learning for Unpaired Image Captioning
Peipei Zhu
Xiao Wang
Lin Zhu
Zhenglong Sun
Weishi Zheng
Yaowei Wang
C. L. P. Chen
VLM
21
31
0
26 May 2022
Beyond Greedy Search: Tracking by Multi-Agent Reinforcement Learning-based Beam Search
Xiao Wang
Zhe Chen
Bo Jiang
Jin Tang
B. Luo
Dacheng Tao
37
18
0
19 May 2022
Efficient Gesture Recognition for the Assistance of Visually Impaired People using Multi-Head Neural Networks
Samer Alashhab
Antonio Javier Gallego
Miguel Ángel Lozano
19
16
0
14 May 2022
Translation between Molecules and Natural Language
Carl N. Edwards
T. Lai
Kevin Ros
Garrett Honke
Kyunghyun Cho
Heng Ji
25
155
0
25 Apr 2022
Visual Attention Methods in Deep Learning: An In-Depth Survey
Mohammed Hassanin
Saeed Anwar
Ibrahim Radwan
F. Khan
Ajmal Saeed Mian
19
145
0
16 Apr 2022
Guiding Attention using Partial-Order Relationships for Image Captioning
Murad Popattia
Muhammad Rafi
Rizwan Qureshi
Shah Nawaz
19
4
0
15 Apr 2022
Image Captioning In the Transformer Age
Yangliu Xu
Li Li
Haiyang Xu
Songfang Huang
Fei Huang
Jianfei Cai
ViT
14
5
0
15 Apr 2022
Vision Transformers in Medical Computer Vision -- A Contemplative Retrospection
Arshi Parvaiz
Muhammad Anwaar Khalid
Rukhsana Zafar
Huma Ameer
M. Ali
M. Fraz
MedIm
11
59
0
29 Mar 2022
Interactive Robotic Grasping with Attribute-Guided Disambiguation
Yang Yang
Xibai Lou
Changhyun Choi
11
30
0
15 Mar 2022
Unpaired Image Captioning by Image-level Weakly-Supervised Visual Concept Recognition
Peipei Zhu
Xiao Wang
Yong Luo
Zhenglong Sun
Wei-Shi Zheng
Yaowei Wang
C. L. P. Chen
22
12
0
07 Mar 2022
A Review of Emerging Research Directions in Abstract Visual Reasoning
Mikolaj Malkiñski
Jacek Mañdziuk
23
38
0
21 Feb 2022
ACORT: A Compact Object Relation Transformer for Parameter Efficient Image Captioning
J. Tan
Y. Tan
C. Chan
Joon Huang Chuah
VLM
ViT
16
15
0
11 Feb 2022
Deep Learning Approaches on Image Captioning: A Review
Taraneh Ghandi
H. Pourreza
H. Mahyar
VLM
8
89
0
31 Jan 2022
A Frustratingly Simple Approach for End-to-End Image Captioning
Ziyang Luo
Yadong Xi
Rongsheng Zhang
Jing Ma
VLM
MLLM
22
16
0
30 Jan 2022
Automatic Audio Captioning using Attention weighted Event based Embeddings
Swapnil Bhosale
Rupayan Chakraborty
Sunil Kumar Kopparapu
26
0
0
28 Jan 2022
Beyond Simple Meta-Learning: Multi-Purpose Models for Multi-Domain, Active and Continual Few-Shot Learning
Peyman Bateni
Jarred Barber
Raghav Goyal
Vaden Masrani
Jan Willem van de Meent
Leonid Sigal
Frank D. Wood
BDL
VLM
42
9
0
13 Jan 2022
Technical Language Supervision for Intelligent Fault Diagnosis in Process Industry
Karl Lowenmark
C. Taal
S. Schnabel
Marcus Liwicki
Fredrik Sandin
13
7
0
11 Dec 2021
Multimodal Fake News Detection
Santiago Alonso-Bartolome
Isabel Segura-Bedmar
17
60
0
09 Dec 2021
Neural Attention for Image Captioning: Review of Outstanding Methods
Zanyar Zohourianshahzadi
Jugal Kalita
VLM
19
45
0
29 Nov 2021
Multi-Glimpse Network: A Robust and Efficient Classification Architecture based on Recurrent Downsampled Attention
S. Tan
Runpei Dong
Kaisheng Ma
22
2
0
03 Nov 2021
Deep Learning in Human Activity Recognition with Wearable Sensors: A Review on Advances
Shibo Zhang
Yaxuan Li
Shen Zhang
Farzad Shahabi
S. Xia
Yuanbei Deng
N. Alshurafa
BDL
20
295
0
31 Oct 2021
End-to-End Supermask Pruning: Learning to Prune Image Captioning Models
J. Tan
C. Chan
Joon Huang Chuah
VLM
49
16
0
07 Oct 2021
Learning Structural Representations for Recipe Generation and Food Retrieval
Hao Wang
Guosheng Lin
S. Hoi
C. Miao
16
28
0
04 Oct 2021
Similar Scenes arouse Similar Emotions: Parallel Data Augmentation for Stylized Image Captioning
Guodun Li
Yuchen Zhai
Zehao Lin
Yin Zhang
43
21
0
26 Aug 2021
INVIGORATE: Interactive Visual Grounding and Grasping in Clutter
Hanbo Zhang
Yunfan Lu
Cunjun Yu
David Hsu
Xuguang Lan
Nanning Zheng
LM&Ro
18
63
0
25 Aug 2021
Explainable Reinforcement Learning for Broad-XAI: A Conceptual Framework and Survey
Richard Dazeley
Peter Vamplew
Francisco Cruz
24
59
0
20 Aug 2021
Previous
1
2
3
4
5
Next