SPICE: Semantic Propositional Image Caption Evaluation

29 July 2016

Papers citing "SPICE: Semantic Propositional Image Caption Evaluation"

50 / 1,002 papers shown

Self-Supervised Image Captioning with CLIP

Chuanyang Jin

VLM SSL

210

26 Jun 2023

Mitigating Hallucination in Large Multi-Modal Models via Robust Instruction TuningInternational Conference on Learning Representations (ICLR), 2023

Fuxiao Liu

454

412

26 Jun 2023

Improving Reference-based Distinctive Image Captioning with Contrastive Rewards

211

25 Jun 2023

An overview on the evaluated video retrieval tasks at TRECVID 2022

...

22 Jun 2023

SituatedGen: Incorporating Geographical and Temporal Contexts into Generative Commonsense ReasoningNeural Information Processing Systems (NeurIPS), 2023

Yunxiang Zhang

Xiaojun Wan

AILaw LRM

232

21 Jun 2023

Learning to Generate Better Than Your LLM

273

20 Jun 2023

Improving Image Captioning Descriptiveness by Ranking and LLM-based Fusion

268

20 Jun 2023

Improving Audio Caption Fluency with Automatic Error Correction

140

16 Jun 2023

Listener Model for the PhotoBook Referential Game with CLIPScores as Implicit Reference ChainAnnual Meeting of the Association for Computational Linguistics (ACL), 2023

Shih-Lun Wu

Yi-Hui Chou

Liang Li

152

16 Jun 2023

Top-Down Framework for Weakly-supervised Grounded Image Captioning

Yi Wang

235

13 Jun 2023

Embodied Executable Policy Learning with Language-based Scene SummarizationNorth American Chapter of the Association for Computational Linguistics (NAACL), 2023

Ding Zhao

156

09 Jun 2023

Towards Adaptable and Interactive Image Captioning with Data Augmentation and Episodic Memory

Aliki Anagnostopoulou

Mareike Hartmann

Daniel Sonntag

CLL VLM

192

06 Jun 2023

SciCap+: A Knowledge Augmented Dataset to Study the Challenges of Scientific Figure Captioning

346

06 Jun 2023

Enhance Temporal Relations in Audio Captioning with Sound Event DetectionInterspeech (Interspeech), 2023

235

02 Jun 2023

Adapting a ConvNeXt model to audio classification on AudioSetInterspeech (Interspeech), 2023

Thomas Pellegrini

Ismail Khalfaoui-Hassani

Etienne Labbé

T. Masquelier

191

01 Jun 2023

CapText: Large Language Model-based Caption Generation From Image Context and Description

Shinjini Ghosh

Sagnik Anupam

VLM

329

01 Jun 2023

DisCLIP: Open-Vocabulary Referring Expression GenerationBritish Machine Vision Conference (BMVC), 2023

269

30 May 2023

Dual Transformer Decoder based Features Fusion Network for Automated Audio CaptioningInterspeech (Interspeech), 2023

168

30 May 2023

FACTUAL: A Benchmark for Faithful and Consistent Textual Scene Graph ParsingAnnual Meeting of the Association for Computational Linguistics (ACL), 2023

328

27 May 2023

Learning to Imagine: Visually-Augmented Natural Language GenerationAnnual Meeting of the Association for Computational Linguistics (ACL), 2023

428

26 May 2023

Text-to-Motion Retrieval: Towards Joint Understanding of Human Motion Data and Natural LanguageAnnual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR), 2023

231

25 May 2023

Visual Programming for Text-to-Image Generation and Evaluation

390

24 May 2023

Not All Metrics Are Guilty: Improving NLG Evaluation by Diversifying ReferencesNorth American Chapter of the Association for Computational Linguistics (NAACL), 2023

165

24 May 2023

#REVAL: a semantic evaluation framework for hashtag recommendationIEEE Transactions on Knowledge and Data Engineering (TKDE), 2023

Areej Alsini

D. Huynh

A. Datta

24 May 2023

Gender Biases in Automatic Evaluation Metrics for Image CaptioningConference on Empirical Methods in Natural Language Processing (EMNLP), 2023

416

24 May 2023

If at First You Don't Succeed, Try, Try Again: Faithful Diffusion-based Text-to-Image Generation by Selection

237

22 May 2023

GEST: the Graph of Events in Space and Time as a Common Representation between Vision and Language

193

22 May 2023

A request for clarity over the End of Sequence token in the Self-Critical Sequence TrainingInternational Conference on Image Analysis and Processing (ICIAP), 2023

J. Hu

Roberto Cavicchioli

Alessandro Capotondi

271

20 May 2023

What Makes for Good Visual Tokenizers for Large Language Models?

Ying Shan

291

20 May 2023

DiffCap: Exploring Continuous Diffusion on Image Captioning

Zefan Cai

205

20 May 2023

PASTS: Progress-Aware Spatio-Temporal Transformer Speaker For Vision-and-Language NavigationEngineering applications of artificial intelligence (Eng. Appl. Artif. Intell.), 2023

216

19 May 2023

LLMScore: Unveiling the Power of Large Language Models in Text-to-Image Synthesis EvaluationNeural Information Processing Systems (NeurIPS), 2023

431

18 May 2023

Listen, Think, and UnderstandInternational Conference on Learning Representations (ICLR), 2023

699

221

18 May 2023

Foundations of Spatial Perception for Robotics: Hierarchical Representations and Real-time Systems

264

11 May 2023

Simple Token-Level Confidence Improves Caption CorrectnessIEEE Workshop/Winter Conference on Applications of Computer Vision (WACV), 2023

245

11 May 2023

InfoMetIC: An Informative Metric for Reference-free Image Caption EvaluationAnnual Meeting of the Association for Computational Linguistics (ACL), 2023

Anwen Hu

Shizhe Chen

Liang Zhang

Qin Jin

237

10 May 2023

Transforming Visual Scene Graphs to Image CaptionsAnnual Meeting of the Association for Computational Linguistics (ACL), 2023

372

03 May 2023

Diverse and Vivid Sound Generation from Text DescriptionsIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2023

191

03 May 2023

Visual Transformation Telling

256

03 May 2023

Multimodal Data Augmentation for Image Captioning using Diffusion Models

207

03 May 2023

Multitask learning in Audio Captioning: a sentence embedding regression loss acts as a regularizerEuropean Signal Processing Conference (EUSIPCO), 2023

Etienne Labbé

J. Pinquier

Thomas Pellegrini

211

02 May 2023

VPGTrans: Transfer Visual Prompt Generator across LLMsNeural Information Processing Systems (NeurIPS), 2023

Ao Zhang

Hao Fei

Yuan Yao

Wei Ji

Li Li

Zhiyuan Liu

Tat-Seng Chua

MLLM VLM

211

101

02 May 2023

Quality-agnostic Image Captioning to Safely Assist People with Vision ImpairmentInternational Joint Conference on Artificial Intelligence (IJCAI), 2023

Lu Yu

Malvina Nikandrou

Jiali Jin

Verena Rieser

158

28 Apr 2023

From Association to Generation: Text-only Captioning by Unsupervised Cross-modal MappingInternational Joint Conference on Artificial Intelligence (IJCAI), 2023

301

26 Apr 2023

A Review of Deep Learning for Video CaptioningIEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2023

...

Fatih Porikli

226

22 Apr 2023

VALOR: Vision-Audio-Language Omni-Perception Pretraining Model and DatasetIEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2023

401

154

17 Apr 2023

Tractable Control for Autoregressive Language GenerationInternational Conference on Machine Learning (ICML), 2023

436

15 Apr 2023

A-CAP: Anticipation Captioning with Commonsense KnowledgeComputer Vision and Pattern Recognition (CVPR), 2023

161

13 Apr 2023

Model-Agnostic Gender Debiased Image CaptioningComputer Vision and Pattern Recognition (CVPR), 2023

339

07 Apr 2023

Graph Attention for Automated Audio CaptioningIEEE Signal Processing Letters (IEEE SPL), 2023

209

07 Apr 2023