ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1601.03896
  4. Cited By
Automatic Description Generation from Images: A Survey of Models,
  Datasets, and Evaluation Measures

Automatic Description Generation from Images: A Survey of Models, Datasets, and Evaluation Measures

15 January 2016
Raffaella Bernardi
Ruken Cakici
Desmond Elliott
Aykut Erdem
Erkut Erdem
Nazli Ikizler-Cinbis
Frank Keller
A. Muscat
Barbara Plank
    EGVM
    VLM
ArXivPDFHTML

Papers citing "Automatic Description Generation from Images: A Survey of Models, Datasets, and Evaluation Measures"

40 / 40 papers shown
Title
LecEval: An Automated Metric for Multimodal Knowledge Acquisition in Multimedia Learning
LecEval: An Automated Metric for Multimodal Knowledge Acquisition in Multimedia Learning
Joy Lim Jia Yin
Daniel Zhang-Li
Jifan Yu
H. Li
Shangqing Tu
...
Zhiyuan Liu
Huiqin Liu
Lei Hou
Juanzi Li
Bin Xu
24
0
0
04 May 2025
ChatBEV: A Visual Language Model that Understands BEV Maps
ChatBEV: A Visual Language Model that Understands BEV Maps
Qingyao Xu
S. Chen
Guang Chen
Yanfeng Wang
Y. Zhang
46
0
0
18 Mar 2025
Understanding Visual Arts Experiences of Blind People
Understanding Visual Arts Experiences of Blind People
Franklin Mingzhe Li
Lotus Zhang
Maryam Bandukda
Abigale Stangl
Kristen Shinohara
Leah Findlater
Patrick Carrington
24
41
0
30 Jan 2023
Keyword localisation in untranscribed speech using visually grounded
  speech models
Keyword localisation in untranscribed speech using visually grounded speech models
Kayode Olaleye
Dan Oneaţă
Herman Kamper
19
7
0
02 Feb 2022
IGLUE: A Benchmark for Transfer Learning across Modalities, Tasks, and
  Languages
IGLUE: A Benchmark for Transfer Learning across Modalities, Tasks, and Languages
Emanuele Bugliarello
Fangyu Liu
Jonas Pfeiffer
Siva Reddy
Desmond Elliott
E. Ponti
Ivan Vulić
MLLM
VLM
ELM
40
62
0
27 Jan 2022
From Show to Tell: A Survey on Deep Learning-based Image Captioning
From Show to Tell: A Survey on Deep Learning-based Image Captioning
Matteo Stefanini
Marcella Cornia
Lorenzo Baraldi
S. Cascianelli
G. Fiameni
Rita Cucchiara
3DV
VLM
MLLM
55
254
0
14 Jul 2021
Learning to Predict Visual Attributes in the Wild
Learning to Predict Visual Attributes in the Wild
Khoi Pham
Kushal Kafle
Zhe-nan Lin
Zhi Ding
Scott D. Cohen
Q. Tran
Abhinav Shrivastava
16
107
0
17 Jun 2021
Selective Replay Enhances Learning in Online Continual Analogical
  Reasoning
Selective Replay Enhances Learning in Online Continual Analogical Reasoning
Tyler L. Hayes
Christopher Kanan
CLL
16
20
0
06 Mar 2021
MultiSubs: A Large-scale Multimodal and Multilingual Dataset
MultiSubs: A Large-scale Multimodal and Multilingual Dataset
Josiah Wang
Pranava Madhyastha
J. Figueiredo
Chiraag Lala
Lucia Specia
VGen
14
11
0
02 Mar 2021
Diagnostic Captioning: A Survey
Diagnostic Captioning: A Survey
John Pavlopoulos
Vasiliki Kougia
Ion Androutsopoulos
D. Papamichail
3DV
MedIm
89
26
0
18 Jan 2021
Generating Image Descriptions via Sequential Cross-Modal Alignment
  Guided by Human Gaze
Generating Image Descriptions via Sequential Cross-Modal Alignment Guided by Human Gaze
Ece Takmaz
Sandro Pezzelle
Lisa Beinborn
Raquel Fernández
27
22
0
09 Nov 2020
Refer, Reuse, Reduce: Generating Subsequent References in Visual and
  Conversational Contexts
Refer, Reuse, Reduce: Generating Subsequent References in Visual and Conversational Contexts
Ece Takmaz
Mario Giulianelli
Sandro Pezzelle
Arabella J. Sinclair
Raquel Fernández
15
26
0
09 Nov 2020
Fine-Grained Grounding for Multimodal Speech Recognition
Fine-Grained Grounding for Multimodal Speech Recognition
Tejas Srinivasan
Ramon Sanabria
Florian Metze
Desmond Elliott
19
11
0
05 Oct 2020
Evaluation of Text Generation: A Survey
Evaluation of Text Generation: A Survey
Asli Celikyilmaz
Elizabeth Clark
Jianfeng Gao
ELM
LM&MA
19
376
0
26 Jun 2020
Multimodal Categorization of Crisis Events in Social Media
Multimodal Categorization of Crisis Events in Social Media
Mahdi Abavisani
Liwei Wu
Shengli Hu
Joel R. Tetreault
A. Jaimes
21
87
0
10 Apr 2020
Deep Multimodal Image-Text Embeddings for Automatic Cross-Media
  Retrieval
Deep Multimodal Image-Text Embeddings for Automatic Cross-Media Retrieval
Hadi Abdi Khojasteh
Ebrahim Ansari
Parvin Razzaghi
Akbar Karimi
VLM
6
4
0
23 Feb 2020
Multimodal Machine Translation through Visuals and Speech
Multimodal Machine Translation through Visuals and Speech
U. Sulubacak
Ozan Caglayan
Stig-Arne Gronroos
Aku Rouhe
Desmond Elliott
Lucia Specia
Jörg Tiedemann
39
72
0
28 Nov 2019
REMIND Your Neural Network to Prevent Catastrophic Forgetting
REMIND Your Neural Network to Prevent Catastrophic Forgetting
Tyler L. Hayes
Kushal Kafle
Robik Shrestha
Manoj Acharya
Christopher Kanan
CLL
29
294
0
06 Oct 2019
Compositional Generalization in Image Captioning
Compositional Generalization in Image Captioning
Mitja Nikolaus
Mostafa Abdou
Matthew Lamm
Rahul Aralikatte
Desmond Elliott
CoGe
21
49
0
10 Sep 2019
MeetUp! A Corpus of Joint Activity Dialogues in a Visual Environment
MeetUp! A Corpus of Joint Activity Dialogues in a Visual Environment
N. Ilinykh
Sina Zarrieß
David Schlangen
19
43
0
11 Jul 2019
Towards Task Understanding in Visual Settings
Towards Task Understanding in Visual Settings
Sebastin Santy
W. Zulfikar
Rishabh Mehrotra
Emine Yilmaz
27
1
0
28 Nov 2018
Pre-gen metrics: Predicting caption quality metrics without generating
  captions
Pre-gen metrics: Predicting caption quality metrics without generating captions
Marc Tanti
Albert Gatt
K. Camilleri
14
2
0
12 Oct 2018
A Comprehensive Survey of Deep Learning for Image Captioning
A Comprehensive Survey of Deep Learning for Image Captioning
Md. Zakir Hossain
Ferdous Sohel
M. Shiratuddin
Hamid Laga
VLM
3DV
28
760
0
06 Oct 2018
Context-Dependent Diffusion Network for Visual Relationship Detection
Context-Dependent Diffusion Network for Visual Relationship Detection
Zhen Cui
Chunyan Xu
Wenming Zheng
Jian Yang
GNN
12
50
0
11 Sep 2018
LUCSS: Language-based User-customized Colourization of Scene Sketches
LUCSS: Language-based User-customized Colourization of Scene Sketches
C. Zou
Haoran Mo
Ruofei Du
Xing Wu
Chengying Gao
Hongbo Fu
22
8
0
30 Aug 2018
Object Relation Detection Based on One-shot Learning
Object Relation Detection Based on One-shot Learning
Li Zhou
Jian-jun Zhao
Jianshu Li
Li-xin Yuan
Jiashi Feng
ObjD
14
23
0
16 Jul 2018
Semantic speech retrieval with a visually grounded model of
  untranscribed speech
Semantic speech retrieval with a visually grounded model of untranscribed speech
Herman Kamper
Gregory Shakhnarovich
Karen Livescu
21
53
0
05 Oct 2017
Self-Guiding Multimodal LSTM - when we do not have a perfect training
  dataset for image captioning
Self-Guiding Multimodal LSTM - when we do not have a perfect training dataset for image captioning
Yang Xian
Yingli Tian
VLM
21
22
0
15 Sep 2017
Fluency-Guided Cross-Lingual Image Captioning
Fluency-Guided Cross-Lingual Image Captioning
Weiyu Lan
Xirong Li
Jianfeng Dong
19
92
0
15 Aug 2017
What is the Role of Recurrent Neural Networks (RNNs) in an Image Caption
  Generator?
What is the Role of Recurrent Neural Networks (RNNs) in an Image Caption Generator?
Marc Tanti
Albert Gatt
K. Camilleri
16
56
0
07 Aug 2017
Multimodal Machine Learning: A Survey and Taxonomy
Multimodal Machine Learning: A Survey and Taxonomy
T. Baltrušaitis
Chaitanya Ahuja
Louis-Philippe Morency
13
2,856
0
26 May 2017
An Analysis of Action Recognition Datasets for Language and Vision Tasks
An Analysis of Action Recognition Datasets for Language and Vision Tasks
Spandana Gella
Frank Keller
ObjD
14
11
0
24 Apr 2017
TGIF-QA: Toward Spatio-Temporal Reasoning in Visual Question Answering
TGIF-QA: Toward Spatio-Temporal Reasoning in Visual Question Answering
Y. Jang
Yale Song
Youngjae Yu
Youngjin Kim
Gunhee Kim
19
545
0
14 Apr 2017
Survey of the State of the Art in Natural Language Generation: Core
  tasks, applications and evaluation
Survey of the State of the Art in Natural Language Generation: Core tasks, applications and evaluation
Albert Gatt
E. Krahmer
LM&MA
ELM
18
809
0
29 Mar 2017
Where to put the Image in an Image Caption Generator
Where to put the Image in an Image Caption Generator
Marc Tanti
Albert Gatt
K. Camilleri
39
96
0
27 Mar 2017
Visual Translation Embedding Network for Visual Relation Detection
Visual Translation Embedding Network for Visual Relation Detection
Hanwang Zhang
Zawlin Kyaw
Shih-Fu Chang
Tat-Seng Chua
ViT
140
560
0
27 Feb 2017
Representations of language in a model of visually grounded speech
  signal
Representations of language in a model of visually grounded speech signal
Grzegorz Chrupała
Lieke Gelderloos
A. Alishahi
22
131
0
07 Feb 2017
phi-LSTM: A Phrase-based Hierarchical LSTM Model for Image Captioning
phi-LSTM: A Phrase-based Hierarchical LSTM Model for Image Captioning
Y. Tan
Chee Seng Chan
VLM
11
29
0
20 Aug 2016
SPICE: Semantic Propositional Image Caption Evaluation
SPICE: Semantic Propositional Image Caption Evaluation
Peter Anderson
Basura Fernando
Mark Johnson
Stephen Gould
EGVM
29
1,883
0
29 Jul 2016
Multi30K: Multilingual English-German Image Descriptions
Multi30K: Multilingual English-German Image Descriptions
Desmond Elliott
Stella Frank
K. Simaán
Lucia Specia
VLM
22
579
0
02 May 2016
1