SPICE: Semantic Propositional Image Caption Evaluation

29 July 2016

Papers citing "SPICE: Semantic Propositional Image Caption Evaluation"

50 / 1,002 papers shown

A survey on knowledge-enhanced multimodal learningArtificial Intelligence Review (Artif Intell Rev), 2022

Maria Lymperaiou

Giorgos Stamou

477

19 Nov 2022

Impact of visual assistance for automated audio captioning

Wim Boes

Hugo Van hamme

209

18 Nov 2022

I Can't Believe There's No Images! Learning Visual Tasks Using only Language SupervisionIEEE International Conference on Computer Vision (ICCV), 2022

354

17 Nov 2022

Progressive Tree-Structured Prototype Network for End-to-End Image CaptioningACM Multimedia (ACM MM), 2022

Pengpeng Zeng

Jinkuan Zhu

Jingkuan Song

Lianli Gao

VLM

186

17 Nov 2022

CapEnrich: Enriching Caption Semantics for Web Images via Cross-modal Pre-trained KnowledgeThe Web Conference (WWW), 2022

Linli Yao

Wei Chen

Qin Jin

VLM

341

17 Nov 2022

PromptCap: Prompt-Guided Task-Aware Image Captioning

Weijia Shi

413

128

15 Nov 2022

Pragmatics in Language Grounding: Phenomena, Tasks, and Modeling ApproachesConference on Empirical Methods in Natural Language Processing (EMNLP), 2022

Daniel Fried

250

15 Nov 2022

Will Large-scale Generative Models Corrupt Future Datasets?IEEE International Conference on Computer Vision (ICCV), 2022

Ryuichiro Hataya

Han Bao

Hiromi Arai

245

15 Nov 2022

Is my automatic audio captioning system so bad? spider-max: a metric to consider several caption candidatesWorkshop on Detection and Classification of Acoustic Scenes and Events (DCASE), 2022

Etienne Labbé

Thomas Pellegrini

J. Pinquier

119

14 Nov 2022

Large-Scale Bidirectional Training for Zero-Shot Image Captioning

220

13 Nov 2022

Investigations in Audio Captioning: Addressing Vocabulary Imbalance and Evaluating Suitability of Language-Centric Performance Metrics

Sandeep Reddy Kothinti

Dimitra Emmanouilidou

255

12 Nov 2022

Exploring Train and Test-Time Augmentations for Audio-Language Learning

172

31 Oct 2022

DiMBERT: Learning Vision-Language Grounded Representations with Disentangled Multimodal-AttentionACM Transactions on Knowledge Discovery from Data (TKDD), 2021

Xuancheng Ren

Yuexian Zou

209

28 Oct 2022

Visual Semantic Parsing: From Images to Abstract Meaning RepresentationConference on Computational Natural Language Learning (CoNLL), 2022

Kalliopi Basioti

Vladimir Pavlovic

270

26 Oct 2022

Retrieval Augmentation for Commonsense Reasoning: A Unified ApproachConference on Empirical Methods in Natural Language Processing (EMNLP), 2022

207

23 Oct 2022

Metric-guided Distillation: Distilling Knowledge from the Metric to Ranker and Retriever for Generative Commonsense ReasoningConference on Empirical Methods in Natural Language Processing (EMNLP), 2022

152

21 Oct 2022

Image-Text Retrieval with Binary and Continuous Label Supervision

Caili Guo

203

20 Oct 2022

Visual Spatial Description: Controlled Spatial-Oriented Image-to-Text GenerationConference on Empirical Methods in Natural Language Processing (EMNLP), 2022

193

20 Oct 2022

Prophet Attention: Predicting Attention with Future Attention for Image CaptioningNeural Information Processing Systems (NeurIPS), 2022

Xuancheng Ren

Yuexian Zou

234

19 Oct 2022

Probing Cross-modal Semantics Alignment Capability from the Textual PerspectiveConference on Empirical Methods in Natural Language Processing (EMNLP), 2022

182

18 Oct 2022

Social Biases in Automatic Evaluation Metrics for NLG

Mingqi Gao

Xiaojun Wan

204

17 Oct 2022

SGRAM: Improving Scene Graph Parsing via Abstract Meaning Representation

215

17 Oct 2022

EfficientVLM: Fast and Accurate Vision-Language Models via Knowledge Distillation and Modal-adaptive PruningAnnual Meeting of the Association for Computational Linguistics (ACL), 2022

211

14 Oct 2022

Plausible May Not Be Faithful: Probing Object Hallucination in Vision-Language Pre-trainingConference of the European Chapter of the Association for Computational Linguistics (EACL), 2022

313

14 Oct 2022

Automated Audio Captioning via Fusion of Low- and High- Dimensional Features

189

10 Oct 2022

CHARD: Clinical Health-Aware Reasoning Across Dimensions for Text Generation ModelsConference of the European Chapter of the Association for Computational Linguistics (EACL), 2022

172

09 Oct 2022

VoLTA: Vision-Language Transformer with Weakly-Supervised Local-Feature Alignment

325

09 Oct 2022

Visualize Before You Write: Imagination-Guided Open-Ended Text GenerationFindings (Findings), 2022

324

07 Oct 2022

Progressive Text-to-Image Generation

Zhengcong Fei

Mingyuan Fan

Li Zhu

Junshi Huang

330

05 Oct 2022

Vision+X: A Survey on Multimodal Learning in the Light of DataIEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2022

Ye Zhu

Yuehua Wu

Andrii Zadaianchuk

Yan Yan

364

05 Oct 2022

Affection: Learning Affective Explanations for Real-World Visual DataComputer Vision and Pattern Recognition (CVPR), 2022

183

04 Oct 2022

Learning to Collocate Visual-Linguistic Neural Modules for Image CaptioningInternational Journal of Computer Vision (IJCV), 2022

Jianfei Cai

274

04 Oct 2022

Is Reinforcement Learning (Not) for Natural Language Processing: Benchmarks, Baselines, and Building Blocks for Natural Language Policy Optimization

Rajkumar Ramamurthy

Prithviraj Ammanabrolu

Yejin Choi

586

280

03 Oct 2022

Text-to-Audio Grounding Based Novel Metric for Evaluating Audio Caption Similarity

Swapnil Bhosale

Rupayan Chakraborty

Sunil Kumar Kopparapu

175

03 Oct 2022

SmallCap: Lightweight Image Captioning Prompted with Retrieval AugmentationComputer Vision and Pattern Recognition (CVPR), 2022

R. Ramos

Bruno Martins

Desmond Elliott

Yova Kementchedjhieva

VLM

206

121

30 Sep 2022

Medical Image Captioning via Generative Pretrained TransformersScientific Reports (Sci Rep), 2022

200

28 Sep 2022

Paraphrasing Is All You Need for Novel Object CaptioningNeural Information Processing Systems (NeurIPS), 2022

Louis-Philippe Morency

Yu-Chiang Frank Wang

185

25 Sep 2022

DRAMA: Joint Risk Localization and Captioning in DrivingIEEE Workshop/Winter Conference on Applications of Computer Vision (WACV), 2022

320

154

22 Sep 2022

Assessing ASR Model Quality on Disordered Speech using BERTScore

Jimmy Tobin

Qisheng Li

Subhashini Venugopalan

Katie Seaver

Richard Cave

Katrin Tomanek

157

21 Sep 2022

Show, Interpret and Tell: Entity-aware Contextualised Image Captioning in WikipediaAAAI Conference on Artificial Intelligence (AAAI), 2022

197

21 Sep 2022

Toward 3D Spatial Reasoning for Human-like Text-based Visual Question AnsweringIEEE Transactions on Image Processing (IEEE TIP), 2022

Hao Li

Qi Wu

382

21 Sep 2022

Learning Distinct and Representative Styles for Image CaptioningNeural Information Processing Systems (NeurIPS), 2022

Qi Chen

Chaorui Deng

Qi Wu

VLM

187

17 Sep 2022

Belief Revision based Caption Re-ranker with Visual Semantic InformationInternational Conference on Computational Linguistics (COLING), 2022

Ahmed Sabir

Francesc Moreno-Noguer

Pranava Madhyastha

Lluís Padró

BDL

209

16 Sep 2022

Distribution Aware Metrics for Conditional Natural Language GenerationInternational Conference on Language Resources and Evaluation (LREC), 2022

David M. Chan

Yiming Ni

David A. Ross

Sudheendra Vijayanarasimhan

Austin Myers

John F. Canny

363

15 Sep 2022

PreSTU: Pre-Training for Scene-Text UnderstandingIEEE International Conference on Computer Vision (ICCV), 2022

Wei-Lun Chao

350

12 Sep 2022

Foundations and Trends in Multimodal Machine Learning: Principles, Challenges, and Open QuestionsACM Computing Surveys (ACM CSUR), 2022

Paul Pu Liang

Amir Zadeh

Louis-Philippe Morency

315

169

07 Sep 2022

On Grounded Planning for Embodied Tasks with Language ModelsAAAI Conference on Artificial Intelligence (AAAI), 2022

Xiang Ren

366

29 Aug 2022

Efficient Vision-Language Pretraining with Visual Concepts and Hierarchical AlignmentBritish Machine Vision Conference (BMVC), 2022

311

29 Aug 2022

On Reality and the Limits of Language Data: Aligning LLMs with Human NormsAnnual Meeting of the Cognitive Science Society (CogSci), 2022

Nigel Collier

Fangyu Liu

Ehsan Shareghi

232

25 Aug 2022

An investigation on selecting audio pre-trained models for audio captioning

Peiran Yan

Sheng-Wei Li

137

12 Aug 2022