ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1607.08822
  4. Cited By
SPICE: Semantic Propositional Image Caption Evaluation

SPICE: Semantic Propositional Image Caption Evaluation

29 July 2016
Peter Anderson
Basura Fernando
Mark Johnson
Stephen Gould
    EGVM
ArXiv (abs)PDFHTML

Papers citing "SPICE: Semantic Propositional Image Caption Evaluation"

50 / 1,002 papers shown
A survey on knowledge-enhanced multimodal learning
A survey on knowledge-enhanced multimodal learningArtificial Intelligence Review (Artif Intell Rev), 2022
Maria Lymperaiou
Giorgos Stamou
477
23
0
19 Nov 2022
Impact of visual assistance for automated audio captioning
Impact of visual assistance for automated audio captioning
Wim Boes
Hugo Van hamme
209
1
0
18 Nov 2022
I Can't Believe There's No Images! Learning Visual Tasks Using only
  Language Supervision
I Can't Believe There's No Images! Learning Visual Tasks Using only Language SupervisionIEEE International Conference on Computer Vision (ICCV), 2022
Sophia Gu
Christopher Clark
Aniruddha Kembhavi
VLM
354
36
0
17 Nov 2022
Progressive Tree-Structured Prototype Network for End-to-End Image
  Captioning
Progressive Tree-Structured Prototype Network for End-to-End Image CaptioningACM Multimedia (ACM MM), 2022
Pengpeng Zeng
Jinkuan Zhu
Jingkuan Song
Lianli Gao
VLM
186
34
0
17 Nov 2022
CapEnrich: Enriching Caption Semantics for Web Images via Cross-modal
  Pre-trained Knowledge
CapEnrich: Enriching Caption Semantics for Web Images via Cross-modal Pre-trained KnowledgeThe Web Conference (WWW), 2022
Linli Yao
Wei Chen
Qin Jin
VLM
341
11
0
17 Nov 2022
PromptCap: Prompt-Guided Task-Aware Image Captioning
PromptCap: Prompt-Guided Task-Aware Image Captioning
Yushi Hu
Hang Hua
Zhengyuan Yang
Weijia Shi
Noah A. Smith
Jiebo Luo
413
128
0
15 Nov 2022
Pragmatics in Language Grounding: Phenomena, Tasks, and Modeling
  Approaches
Pragmatics in Language Grounding: Phenomena, Tasks, and Modeling ApproachesConference on Empirical Methods in Natural Language Processing (EMNLP), 2022
Daniel Fried
Nicholas Tomlin
Jennifer Hu
Roma Patel
Aida Nematzadeh
250
9
0
15 Nov 2022
Will Large-scale Generative Models Corrupt Future Datasets?
Will Large-scale Generative Models Corrupt Future Datasets?IEEE International Conference on Computer Vision (ICCV), 2022
Ryuichiro Hataya
Han Bao
Hiromi Arai
245
71
0
15 Nov 2022
Is my automatic audio captioning system so bad? spider-max: a metric to
  consider several caption candidates
Is my automatic audio captioning system so bad? spider-max: a metric to consider several caption candidatesWorkshop on Detection and Classification of Acoustic Scenes and Events (DCASE), 2022
Etienne Labbé
Thomas Pellegrini
J. Pinquier
119
5
0
14 Nov 2022
Large-Scale Bidirectional Training for Zero-Shot Image Captioning
Large-Scale Bidirectional Training for Zero-Shot Image Captioning
Taehoon Kim
Mark A Marsden
Pyunghwan Ahn
Sangyun Kim
Sihaeng Lee
Alessandra Sala
S. Kim
VLM
220
5
0
13 Nov 2022
Investigations in Audio Captioning: Addressing Vocabulary Imbalance and
  Evaluating Suitability of Language-Centric Performance Metrics
Investigations in Audio Captioning: Addressing Vocabulary Imbalance and Evaluating Suitability of Language-Centric Performance Metrics
Sandeep Reddy Kothinti
Dimitra Emmanouilidou
255
3
0
12 Nov 2022
Exploring Train and Test-Time Augmentations for Audio-Language Learning
Exploring Train and Test-Time Augmentations for Audio-Language Learning
Eungbeom Kim
Jinhee Kim
Yoori Oh
Kyungsu Kim
Minju Park
Jaeheon Sim
J. Lee
Kyogu Lee
172
16
0
31 Oct 2022
DiMBERT: Learning Vision-Language Grounded Representations with
  Disentangled Multimodal-Attention
DiMBERT: Learning Vision-Language Grounded Representations with Disentangled Multimodal-AttentionACM Transactions on Knowledge Discovery from Data (TKDD), 2021
Fenglin Liu
Xian Wu
Shen Ge
Xuancheng Ren
Wei Fan
Xu Sun
Yuexian Zou
VLM
209
13
0
28 Oct 2022
Visual Semantic Parsing: From Images to Abstract Meaning Representation
Visual Semantic Parsing: From Images to Abstract Meaning RepresentationConference on Computational Natural Language Learning (CoNLL), 2022
M. A. Abdelsalam
Zhan Shi
Federico Fancellu
Kalliopi Basioti
Dhaivat Bhatt
Vladimir Pavlovic
Afsaneh Fazly
GNN
270
6
0
26 Oct 2022
Retrieval Augmentation for Commonsense Reasoning: A Unified Approach
Retrieval Augmentation for Commonsense Reasoning: A Unified ApproachConference on Empirical Methods in Natural Language Processing (EMNLP), 2022
Wenhao Yu
Chenguang Zhu
Zhihan Zhang
Shuohang Wang
Zhuosheng Zhang
Yuwei Fang
Meng Jiang
LRMReLM
207
20
0
23 Oct 2022
Metric-guided Distillation: Distilling Knowledge from the Metric to
  Ranker and Retriever for Generative Commonsense Reasoning
Metric-guided Distillation: Distilling Knowledge from the Metric to Ranker and Retriever for Generative Commonsense ReasoningConference on Empirical Methods in Natural Language Processing (EMNLP), 2022
Xingwei He
Yeyun Gong
Alex Jin
Weizhen Qi
Hang Zhang
Jian Jiao
Bartuer Zhou
Biao Cheng
Sm Yiu
Nan Duan
152
12
0
21 Oct 2022
Image-Text Retrieval with Binary and Continuous Label Supervision
Image-Text Retrieval with Binary and Continuous Label Supervision
Zheng Li
Caili Guo
Zerun Feng
Lei Li
Ying Jin
Yufeng Zhang
VLM
203
5
0
20 Oct 2022
Visual Spatial Description: Controlled Spatial-Oriented Image-to-Text
  Generation
Visual Spatial Description: Controlled Spatial-Oriented Image-to-Text GenerationConference on Empirical Methods in Natural Language Processing (EMNLP), 2022
Yu Zhao
Jianguo Wei
Zhichao Lin
Yueheng Sun
Meishan Zhang
Hao Fei
193
17
0
20 Oct 2022
Prophet Attention: Predicting Attention with Future Attention for Image
  Captioning
Prophet Attention: Predicting Attention with Future Attention for Image CaptioningNeural Information Processing Systems (NeurIPS), 2022
Fenglin Liu
Xuancheng Ren
Xian Wu
Wei Fan
Yuexian Zou
Xu Sun
234
52
0
19 Oct 2022
Probing Cross-modal Semantics Alignment Capability from the Textual
  Perspective
Probing Cross-modal Semantics Alignment Capability from the Textual PerspectiveConference on Empirical Methods in Natural Language Processing (EMNLP), 2022
Zheng Ma
Shi Zong
Mianzhi Pan
Jianbing Zhang
Shujian Huang
Xinyu Dai
Jiajun Chen
182
5
0
18 Oct 2022
Social Biases in Automatic Evaluation Metrics for NLG
Social Biases in Automatic Evaluation Metrics for NLG
Mingqi Gao
Xiaojun Wan
204
4
0
17 Oct 2022
SGRAM: Improving Scene Graph Parsing via Abstract Meaning Representation
SGRAM: Improving Scene Graph Parsing via Abstract Meaning Representation
Woo Suk Choi
Y. Heo
Byoung-Tak Zhang
GNN
215
4
0
17 Oct 2022
EfficientVLM: Fast and Accurate Vision-Language Models via Knowledge
  Distillation and Modal-adaptive Pruning
EfficientVLM: Fast and Accurate Vision-Language Models via Knowledge Distillation and Modal-adaptive PruningAnnual Meeting of the Association for Computational Linguistics (ACL), 2022
Tiannan Wang
Wangchunshu Zhou
Yan Zeng
Xinsong Zhang
VLM
211
64
0
14 Oct 2022
Plausible May Not Be Faithful: Probing Object Hallucination in
  Vision-Language Pre-training
Plausible May Not Be Faithful: Probing Object Hallucination in Vision-Language Pre-trainingConference of the European Chapter of the Association for Computational Linguistics (EACL), 2022
Wenliang Dai
Zihan Liu
Ziwei Ji
Jane Polak Scowcroft
Pascale Fung
MLLMVLM
313
76
0
14 Oct 2022
Automated Audio Captioning via Fusion of Low- and High- Dimensional
  Features
Automated Audio Captioning via Fusion of Low- and High- Dimensional Features
Jianyuan Sun
Xubo Liu
Xinhao Mei
Mark D. Plumbley
V. Kılıç
Wenwu Wang
189
3
0
10 Oct 2022
CHARD: Clinical Health-Aware Reasoning Across Dimensions for Text
  Generation Models
CHARD: Clinical Health-Aware Reasoning Across Dimensions for Text Generation ModelsConference of the European Chapter of the Association for Computational Linguistics (EACL), 2022
Steven Y. Feng
Vivek Khetan
Bogdan Sacaleanu
A. Gershman
Eduard H. Hovy
LRM
172
14
0
09 Oct 2022
VoLTA: Vision-Language Transformer with Weakly-Supervised Local-Feature
  Alignment
VoLTA: Vision-Language Transformer with Weakly-Supervised Local-Feature Alignment
Shraman Pramanick
Li Jing
Sayan Nag
Jiachen Zhu
Hardik Shah
Yann LeCun
Ramalingam Chellappa
325
26
0
09 Oct 2022
Visualize Before You Write: Imagination-Guided Open-Ended Text
  Generation
Visualize Before You Write: Imagination-Guided Open-Ended Text GenerationFindings (Findings), 2022
Wanrong Zhu
An Yan
Yujie Lu
Wenda Xu
Xinze Wang
Miguel P. Eckstein
William Yang Wang
324
38
0
07 Oct 2022
Progressive Text-to-Image Generation
Progressive Text-to-Image Generation
Zhengcong Fei
Mingyuan Fan
Li Zhu
Junshi Huang
330
4
0
05 Oct 2022
Vision+X: A Survey on Multimodal Learning in the Light of Data
Vision+X: A Survey on Multimodal Learning in the Light of DataIEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2022
Ye Zhu
Yuehua Wu
Andrii Zadaianchuk
Yan Yan
364
39
0
05 Oct 2022
Affection: Learning Affective Explanations for Real-World Visual Data
Affection: Learning Affective Explanations for Real-World Visual DataComputer Vision and Pattern Recognition (CVPR), 2022
Panos Achlioptas
M. Ovsjanikov
Leonidas Guibas
Sergey Tulyakov
183
26
0
04 Oct 2022
Learning to Collocate Visual-Linguistic Neural Modules for Image
  Captioning
Learning to Collocate Visual-Linguistic Neural Modules for Image CaptioningInternational Journal of Computer Vision (IJCV), 2022
Xu Yang
Hanwang Zhang
Chongyang Gao
Jianfei Cai
MLLM
274
10
0
04 Oct 2022
Is Reinforcement Learning (Not) for Natural Language Processing:
  Benchmarks, Baselines, and Building Blocks for Natural Language Policy
  Optimization
Is Reinforcement Learning (Not) for Natural Language Processing: Benchmarks, Baselines, and Building Blocks for Natural Language Policy Optimization
Rajkumar Ramamurthy
Prithviraj Ammanabrolu
Kianté Brantley
Jack Hessel
R. Sifa
Christian Bauckhage
Hannaneh Hajishirzi
Yejin Choi
OffRL
586
280
0
03 Oct 2022
Text-to-Audio Grounding Based Novel Metric for Evaluating Audio Caption
  Similarity
Text-to-Audio Grounding Based Novel Metric for Evaluating Audio Caption Similarity
Swapnil Bhosale
Rupayan Chakraborty
Sunil Kumar Kopparapu
175
2
0
03 Oct 2022
SmallCap: Lightweight Image Captioning Prompted with Retrieval
  Augmentation
SmallCap: Lightweight Image Captioning Prompted with Retrieval AugmentationComputer Vision and Pattern Recognition (CVPR), 2022
R. Ramos
Bruno Martins
Desmond Elliott
Yova Kementchedjhieva
VLM
206
121
0
30 Sep 2022
Medical Image Captioning via Generative Pretrained Transformers
Medical Image Captioning via Generative Pretrained TransformersScientific Reports (Sci Rep), 2022
Alexander Selivanov
Oleg Y. Rogov
Daniil Chesakov
Artem Shelmanov
Irina Fedulova
Dmitry V. Dylov
MedIm
200
91
0
28 Sep 2022
Paraphrasing Is All You Need for Novel Object Captioning
Paraphrasing Is All You Need for Novel Object CaptioningNeural Information Processing Systems (NeurIPS), 2022
Cheng Yang
Yifan Hao
Wanshu Fan
Ruslan Salakhutdinov
Louis-Philippe Morency
Yu-Chiang Frank Wang
185
6
0
25 Sep 2022
DRAMA: Joint Risk Localization and Captioning in Driving
DRAMA: Joint Risk Localization and Captioning in DrivingIEEE Workshop/Winter Conference on Applications of Computer Vision (WACV), 2022
Srikanth Malla
Chiho Choi
Isht Dwivedi
Joonhyang Choi
Jiachen Li
320
154
0
22 Sep 2022
Assessing ASR Model Quality on Disordered Speech using BERTScore
Assessing ASR Model Quality on Disordered Speech using BERTScore
Jimmy Tobin
Qisheng Li
Subhashini Venugopalan
Katie Seaver
Richard Cave
Katrin Tomanek
157
17
0
21 Sep 2022
Show, Interpret and Tell: Entity-aware Contextualised Image Captioning
  in Wikipedia
Show, Interpret and Tell: Entity-aware Contextualised Image Captioning in WikipediaAAAI Conference on Artificial Intelligence (AAAI), 2022
K. Nguyen
Ali Furkan Biten
Andrés Mafla
Lluís Gómez
Dimosthenis Karatzas
197
15
0
21 Sep 2022
Toward 3D Spatial Reasoning for Human-like Text-based Visual Question
  Answering
Toward 3D Spatial Reasoning for Human-like Text-based Visual Question AnsweringIEEE Transactions on Image Processing (IEEE TIP), 2022
Hao Li
Jinfa Huang
Peng Jin
Guoli Song
Qi Wu
Jie Chen
382
29
0
21 Sep 2022
Learning Distinct and Representative Styles for Image Captioning
Learning Distinct and Representative Styles for Image CaptioningNeural Information Processing Systems (NeurIPS), 2022
Qi Chen
Chaorui Deng
Qi Wu
VLM
187
27
0
17 Sep 2022
Belief Revision based Caption Re-ranker with Visual Semantic Information
Belief Revision based Caption Re-ranker with Visual Semantic InformationInternational Conference on Computational Linguistics (COLING), 2022
Ahmed Sabir
Francesc Moreno-Noguer
Pranava Madhyastha
Lluís Padró
BDL
209
2
0
16 Sep 2022
Distribution Aware Metrics for Conditional Natural Language Generation
Distribution Aware Metrics for Conditional Natural Language GenerationInternational Conference on Language Resources and Evaluation (LREC), 2022
David M. Chan
Yiming Ni
David A. Ross
Sudheendra Vijayanarasimhan
Austin Myers
John F. Canny
363
4
0
15 Sep 2022
PreSTU: Pre-Training for Scene-Text Understanding
PreSTU: Pre-Training for Scene-Text UnderstandingIEEE International Conference on Computer Vision (ICCV), 2022
Jihyung Kil
Soravit Changpinyo
Xi Chen
Hexiang Hu
Sebastian Goodman
Wei-Lun Chao
Radu Soricut
VLM
350
38
0
12 Sep 2022
Foundations and Trends in Multimodal Machine Learning: Principles,
  Challenges, and Open Questions
Foundations and Trends in Multimodal Machine Learning: Principles, Challenges, and Open QuestionsACM Computing Surveys (ACM CSUR), 2022
Paul Pu Liang
Amir Zadeh
Louis-Philippe Morency
315
169
0
07 Sep 2022
On Grounded Planning for Embodied Tasks with Language Models
On Grounded Planning for Embodied Tasks with Language ModelsAAAI Conference on Artificial Intelligence (AAAI), 2022
Bill Yuchen Lin
Chengsong Huang
Qian Liu
Wenda Gu
Sam Sommerer
Xiang Ren
LM&Ro
366
49
0
29 Aug 2022
Efficient Vision-Language Pretraining with Visual Concepts and
  Hierarchical Alignment
Efficient Vision-Language Pretraining with Visual Concepts and Hierarchical AlignmentBritish Machine Vision Conference (BMVC), 2022
Mustafa Shukor
Guillaume Couairon
Matthieu Cord
VLMCLIP
311
28
0
29 Aug 2022
On Reality and the Limits of Language Data: Aligning LLMs with Human
  Norms
On Reality and the Limits of Language Data: Aligning LLMs with Human NormsAnnual Meeting of the Cognitive Science Society (CogSci), 2022
Nigel Collier
Fangyu Liu
Ehsan Shareghi
232
3
0
25 Aug 2022
An investigation on selecting audio pre-trained models for audio
  captioning
An investigation on selecting audio pre-trained models for audio captioning
Peiran Yan
Sheng-Wei Li
137
0
0
12 Aug 2022
Previous
123...91011...192021
Next
Page 10 of 21
Pageof 21