Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1601.03896
Cited By
Automatic Description Generation from Images: A Survey of Models, Datasets, and Evaluation Measures
15 January 2016
Raffaella Bernardi
Ruken Cakici
Desmond Elliott
Aykut Erdem
Erkut Erdem
Nazli Ikizler-Cinbis
Frank Keller
A. Muscat
Barbara Plank
EGVM
VLM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Automatic Description Generation from Images: A Survey of Models, Datasets, and Evaluation Measures"
40 / 40 papers shown
Title
LecEval: An Automated Metric for Multimodal Knowledge Acquisition in Multimedia Learning
Joy Lim Jia Yin
Daniel Zhang-Li
Jifan Yu
H. Li
Shangqing Tu
...
Zhiyuan Liu
Huiqin Liu
Lei Hou
Juanzi Li
Bin Xu
24
0
0
04 May 2025
ChatBEV: A Visual Language Model that Understands BEV Maps
Qingyao Xu
S. Chen
Guang Chen
Yanfeng Wang
Y. Zhang
44
0
0
18 Mar 2025
Understanding Visual Arts Experiences of Blind People
Franklin Mingzhe Li
Lotus Zhang
Maryam Bandukda
Abigale Stangl
Kristen Shinohara
Leah Findlater
Patrick Carrington
24
41
0
30 Jan 2023
Keyword localisation in untranscribed speech using visually grounded speech models
Kayode Olaleye
Dan Oneaţă
Herman Kamper
19
7
0
02 Feb 2022
IGLUE: A Benchmark for Transfer Learning across Modalities, Tasks, and Languages
Emanuele Bugliarello
Fangyu Liu
Jonas Pfeiffer
Siva Reddy
Desmond Elliott
E. Ponti
Ivan Vulić
MLLM
VLM
ELM
40
62
0
27 Jan 2022
From Show to Tell: A Survey on Deep Learning-based Image Captioning
Matteo Stefanini
Marcella Cornia
Lorenzo Baraldi
S. Cascianelli
G. Fiameni
Rita Cucchiara
3DV
VLM
MLLM
55
254
0
14 Jul 2021
Learning to Predict Visual Attributes in the Wild
Khoi Pham
Kushal Kafle
Zhe-nan Lin
Zhi Ding
Scott D. Cohen
Q. Tran
Abhinav Shrivastava
16
107
0
17 Jun 2021
Selective Replay Enhances Learning in Online Continual Analogical Reasoning
Tyler L. Hayes
Christopher Kanan
CLL
16
20
0
06 Mar 2021
MultiSubs: A Large-scale Multimodal and Multilingual Dataset
Josiah Wang
Pranava Madhyastha
J. Figueiredo
Chiraag Lala
Lucia Specia
VGen
14
11
0
02 Mar 2021
Diagnostic Captioning: A Survey
John Pavlopoulos
Vasiliki Kougia
Ion Androutsopoulos
D. Papamichail
3DV
MedIm
89
26
0
18 Jan 2021
Generating Image Descriptions via Sequential Cross-Modal Alignment Guided by Human Gaze
Ece Takmaz
Sandro Pezzelle
Lisa Beinborn
Raquel Fernández
27
22
0
09 Nov 2020
Refer, Reuse, Reduce: Generating Subsequent References in Visual and Conversational Contexts
Ece Takmaz
Mario Giulianelli
Sandro Pezzelle
Arabella J. Sinclair
Raquel Fernández
15
26
0
09 Nov 2020
Fine-Grained Grounding for Multimodal Speech Recognition
Tejas Srinivasan
Ramon Sanabria
Florian Metze
Desmond Elliott
19
11
0
05 Oct 2020
Evaluation of Text Generation: A Survey
Asli Celikyilmaz
Elizabeth Clark
Jianfeng Gao
ELM
LM&MA
19
376
0
26 Jun 2020
Multimodal Categorization of Crisis Events in Social Media
Mahdi Abavisani
Liwei Wu
Shengli Hu
Joel R. Tetreault
A. Jaimes
21
87
0
10 Apr 2020
Deep Multimodal Image-Text Embeddings for Automatic Cross-Media Retrieval
Hadi Abdi Khojasteh
Ebrahim Ansari
Parvin Razzaghi
Akbar Karimi
VLM
6
4
0
23 Feb 2020
Multimodal Machine Translation through Visuals and Speech
U. Sulubacak
Ozan Caglayan
Stig-Arne Gronroos
Aku Rouhe
Desmond Elliott
Lucia Specia
Jörg Tiedemann
39
72
0
28 Nov 2019
REMIND Your Neural Network to Prevent Catastrophic Forgetting
Tyler L. Hayes
Kushal Kafle
Robik Shrestha
Manoj Acharya
Christopher Kanan
CLL
29
294
0
06 Oct 2019
Compositional Generalization in Image Captioning
Mitja Nikolaus
Mostafa Abdou
Matthew Lamm
Rahul Aralikatte
Desmond Elliott
CoGe
21
49
0
10 Sep 2019
MeetUp! A Corpus of Joint Activity Dialogues in a Visual Environment
N. Ilinykh
Sina Zarrieß
David Schlangen
19
43
0
11 Jul 2019
Towards Task Understanding in Visual Settings
Sebastin Santy
W. Zulfikar
Rishabh Mehrotra
Emine Yilmaz
27
1
0
28 Nov 2018
Pre-gen metrics: Predicting caption quality metrics without generating captions
Marc Tanti
Albert Gatt
K. Camilleri
14
2
0
12 Oct 2018
A Comprehensive Survey of Deep Learning for Image Captioning
Md. Zakir Hossain
Ferdous Sohel
M. Shiratuddin
Hamid Laga
VLM
3DV
28
760
0
06 Oct 2018
Context-Dependent Diffusion Network for Visual Relationship Detection
Zhen Cui
Chunyan Xu
Wenming Zheng
Jian Yang
GNN
12
50
0
11 Sep 2018
LUCSS: Language-based User-customized Colourization of Scene Sketches
C. Zou
Haoran Mo
Ruofei Du
Xing Wu
Chengying Gao
Hongbo Fu
22
8
0
30 Aug 2018
Object Relation Detection Based on One-shot Learning
Li Zhou
Jian-jun Zhao
Jianshu Li
Li-xin Yuan
Jiashi Feng
ObjD
14
23
0
16 Jul 2018
Semantic speech retrieval with a visually grounded model of untranscribed speech
Herman Kamper
Gregory Shakhnarovich
Karen Livescu
18
53
0
05 Oct 2017
Self-Guiding Multimodal LSTM - when we do not have a perfect training dataset for image captioning
Yang Xian
Yingli Tian
VLM
21
22
0
15 Sep 2017
Fluency-Guided Cross-Lingual Image Captioning
Weiyu Lan
Xirong Li
Jianfeng Dong
19
92
0
15 Aug 2017
What is the Role of Recurrent Neural Networks (RNNs) in an Image Caption Generator?
Marc Tanti
Albert Gatt
K. Camilleri
16
56
0
07 Aug 2017
Multimodal Machine Learning: A Survey and Taxonomy
T. Baltrušaitis
Chaitanya Ahuja
Louis-Philippe Morency
13
2,856
0
26 May 2017
An Analysis of Action Recognition Datasets for Language and Vision Tasks
Spandana Gella
Frank Keller
ObjD
14
11
0
24 Apr 2017
TGIF-QA: Toward Spatio-Temporal Reasoning in Visual Question Answering
Y. Jang
Yale Song
Youngjae Yu
Youngjin Kim
Gunhee Kim
19
545
0
14 Apr 2017
Survey of the State of the Art in Natural Language Generation: Core tasks, applications and evaluation
Albert Gatt
E. Krahmer
LM&MA
ELM
18
809
0
29 Mar 2017
Where to put the Image in an Image Caption Generator
Marc Tanti
Albert Gatt
K. Camilleri
39
96
0
27 Mar 2017
Visual Translation Embedding Network for Visual Relation Detection
Hanwang Zhang
Zawlin Kyaw
Shih-Fu Chang
Tat-Seng Chua
ViT
140
560
0
27 Feb 2017
Representations of language in a model of visually grounded speech signal
Grzegorz Chrupała
Lieke Gelderloos
A. Alishahi
22
131
0
07 Feb 2017
phi-LSTM: A Phrase-based Hierarchical LSTM Model for Image Captioning
Y. Tan
Chee Seng Chan
VLM
11
29
0
20 Aug 2016
SPICE: Semantic Propositional Image Caption Evaluation
Peter Anderson
Basura Fernando
Mark Johnson
Stephen Gould
EGVM
29
1,883
0
29 Jul 2016
Multi30K: Multilingual English-German Image Descriptions
Desmond Elliott
Stella Frank
K. Simaán
Lucia Specia
VLM
22
579
0
02 May 2016
1