Learning to Evaluate Image Captioning

17 June 2018

Papers citing "Learning to Evaluate Image Captioning"

50 / 74 papers shown

Title
G-VEval: A Versatile Metric for Evaluating Image and Video Captions Using GPT-4o Tony Cheng Tong Sirui He Z. Shao Dit-Yan Yeung 65 3 0 18 Dec 2024
Neptune: The Long Orbit to Benchmarking Long Video Understanding Arsha Nagrani Mingda Zhang Ramin Mehran Rachel Hornung N. B. Gundavarapu ... Boqing Gong Cordelia Schmid Mikhail Sirotenko Yukun Zhu Tobias Weyand 98 4 0 12 Dec 2024
Positive-Augmented Contrastive Learning for Vision-and-Language Evaluation and Training Sara Sarto Nicholas Moratelli Marcella Cornia Lorenzo Baraldi Rita Cucchiara 23 3 0 09 Oct 2024
DENEB: A Hallucination-Robust Automatic Evaluation Metric for Image Captioning Kazuki Matsuda Yuiga Wada Komei Sugiura 21 0 0 28 Sep 2024
Surveying the Landscape of Image Captioning Evaluation: A Comprehensive Taxonomy, Trends and Metrics Analysis Uri Berger Gabriel Stanovsky Omri Abend Lea Frermann 24 0 0 09 Aug 2024
HICEScore: A Hierarchical Metric for Image Captioning Evaluation Zequn Zeng Jianqiao Sun Hao Zhang Tiansheng Wen Yudi Su Yan Xie Zhengjue Wang Boli Chen 39 3 0 26 Jul 2024
Detecting Errors through Ensembling Prompts (DEEP): An End-to-End LLM Framework for Detecting Factual Errors Alex Chandler Devesh Surve Hui Su HILM UQCV 15 1 0 18 Jun 2024
AutoAD III: The Prequel -- Back to the Pixels Tengda Han Max Bain Arsha Nagrani Gül Varol Weidi Xie Andrew Zisserman VGen DiffM 36 20 0 22 Apr 2024
Can Feedback Enhance Semantic Grounding in Large Vision-Language Models? Yuan-Hong Liao Rafid Mahmood Sanja Fidler David Acuna VLM 44 7 0 09 Apr 2024
Polos: Multimodal Metric Learning from Human Feedback for Image Captioning Yuiga Wada Kanta Kaneda Daichi Saito Komei Sugiura 22 24 0 28 Feb 2024
Open-ended VQA benchmarking of Vision-Language models by exploiting Classification datasets and their semantic hierarchy Simon Ging M. A. Bravo Thomas Brox VLM 38 11 0 11 Feb 2024
Leveraging Large Language Models for NLG Evaluation: Advances and Challenges Zhen Li Xiaohan Xu Tao Shen Can Xu Jia-Chen Gu Yuxuan Lai Chongyang Tao Shuai Ma LM&MA ELM 26 9 0 13 Jan 2024
See, Say, and Segment: Teaching LMMs to Overcome False Premises Tsung-Han Wu Giscard Biamby David M. Chan Lisa Dunlap Ritwik Gupta Xudong Wang Joseph E. Gonzalez Trevor Darrell VLM MLLM 30 18 0 13 Dec 2023
CLAIR: Evaluating Image Captions with Large Language Models David M. Chan Suzanne Petryk Joseph E. Gonzalez Trevor Darrell John F. Canny 38 19 0 19 Oct 2023
ContextRef: Evaluating Referenceless Metrics For Image Description Generation Elisa Kreiss E. Zelikman Christopher Potts Nick Haber 8 5 0 21 Sep 2023
Linear Alignment of Vision-language Models for Image Captioning Fabian Paischer M. Hofmarcher Sepp Hochreiter Thomas Adler CLIP VLM 35 0 0 10 Jul 2023
BiomedGPT: A Unified and Generalist Biomedical Generative Pre-trained Transformer for Vision, Language, and Multimodal Tasks Kai Zhang Jun Yu Eashan Adhikarla Rong-Er Zhou Zhilin Yan ... Xun Chen Yong Chen Quanzheng Li Hongfang Liu Lichao Sun LM&MA MedIm 12 149 0 26 May 2023
Multimodal Data Augmentation for Image Captioning using Diffusion Models Changrong Xiao S. Xu Kunpeng Zhang DiffM 11 10 0 03 May 2023
Positive-Augmented Contrastive Learning for Image and Video Captioning Evaluation Sara Sarto Manuele Barraco Marcella Cornia Lorenzo Baraldi Rita Cucchiara 8 55 0 21 Mar 2023
Test-Time Distribution Normalization for Contrastively Learned Vision-language Models Yi Zhou Juntao Ren Fengyu Li Ramin Zabih Ser-Nam Lim VLM 21 13 0 22 Feb 2023
Not All Errors are Equal: Learning Text Generation Metrics using Stratified Error Synthesis Wenda Xu Yi-Lin Tuan Yujie Lu Michael Stephen Saxon Lei Li William Yang Wang 31 22 0 10 Oct 2022
Affection: Learning Affective Explanations for Real-World Visual Data Panos Achlioptas M. Ovsjanikov Leonidas J. Guibas Sergey Tulyakov 59 10 0 04 Oct 2022
A Feature-space Multimodal Data Augmentation Technique for Text-video Retrieval Alex Falcon G. Serra O. Lanz VGen 15 25 0 03 Aug 2022
Contrastive Cross-Modal Knowledge Sharing Pre-training for Vision-Language Representation Learning and Retrieval Keyu Wen Zhenshan Tan Qingrong Cheng Cheng Chen X. Gu VLM 6 0 0 02 Jul 2022
Measuring Representational Harms in Image Captioning Angelina Wang Solon Barocas Kristen Laird Hanna M. Wallach 11 51 0 14 Jun 2022
Mutual Information Divergence: A Unified Metric for Multimodal Generative Models Jin-Hwa Kim Yunji Kim Jiyoung Lee Kang Min Yoo Sang-Woo Lee EGVM 17 32 0 25 May 2022
RoViST:Learning Robust Metrics for Visual Storytelling Eileen Wang S. Han Josiah Poon 14 7 0 08 May 2022
Reproducibility Issues for BERT-based Evaluation Metrics Yanran Chen Jonas Belouadi Steffen Eger 27 16 0 30 Mar 2022
BERTHA: Video Captioning Evaluation Via Transfer-Learned Human Assessment Luis Lebron Yvette Graham Kevin McGuinness K. Kouramas Noel E. O'Connor 14 3 0 25 Jan 2022
Dynamic population-based meta-learning for multi-agent communication with natural language Abhinav Gupta Marc Lanctot Angeliki Lazaridou LLMAG 17 20 0 27 Oct 2021
Reason induced visual attention for explainable autonomous driving Sikai Chen Jiqian Dong Runjia Du Yujie Li S. Labi 29 0 0 11 Oct 2021
COSMic: A Coherence-Aware Generation Metric for Image Descriptions Mert Inan P. Sharma Baber Khalid Radu Soricut Matthew Stone Malihe Alikhani EGVM 16 13 0 11 Sep 2021
Problem Learning: Towards the Free Will of Machines Yongfeng Zhang FaML 13 2 0 01 Sep 2021
Automatic Text Evaluation through the Lens of Wasserstein Barycenters Pierre Colombo Guillaume Staerman Chloé Clavel Pablo Piantanida 14 41 0 27 Aug 2021
Language Model Augmented Relevance Score Ruibo Liu Jason W. Wei Soroush Vosoughi 14 10 0 19 Aug 2021
From Show to Tell: A Survey on Deep Learning-based Image Captioning Matteo Stefanini Marcella Cornia Lorenzo Baraldi S. Cascianelli G. Fiameni Rita Cucchiara 3DV VLM MLLM 53 244 0 14 Jul 2021
Contrastive Semantic Similarity Learning for Image Captioning Evaluation with Intrinsic Auto-encoder Chao Zeng Tiesong Zhao Sam Kwong 14 2 0 29 Jun 2021
UMIC: An Unreferenced Metric for Image Captioning via Contrastive Learning Hwanhee Lee Seunghyun Yoon Franck Dernoncourt Trung Bui Kyomin Jung VLM 11 34 0 26 Jun 2021
SMURF: SeMantic and linguistic UndeRstanding Fusion for Caption Evaluation via Typicality Analysis Joshua Forster Feinglass Yezhou Yang 11 18 0 02 Jun 2021
CLIPScore: A Reference-free Evaluation Metric for Image Captioning Jack Hessel Ari Holtzman Maxwell Forbes Ronan Le Bras Yejin Choi CLIP 6 882 0 18 Apr 2021
ArtEmis: Affective Language for Visual Art Panos Achlioptas M. Ovsjanikov Kilichbek Haydarov Mohamed Elhoseiny Leonidas J. Guibas 18 115 0 19 Jan 2021
WEmbSim: A Simple yet Effective Metric for Image Captioning Naeha Sharif Lyndon White Bennamoun Wei Liu Syed Afaq Ali Shah 15 1 0 24 Dec 2020
LCEval: Learned Composite Metric for Caption Evaluation Naeha Sharif Lyndon White Bennamoun Wei Liu Syed Afaq Ali Shah 16 8 0 24 Dec 2020
Intrinsic Image Captioning Evaluation Chao Zeng Sam Kwong 11 0 0 14 Dec 2020
Dual Attention on Pyramid Feature Maps for Image Captioning Litao Yu Jian Andrew Zhang Qiang Wu 6 39 0 02 Nov 2020
Quantifying Learnability and Describability of Visual Concepts Emerging in Representation Learning Iro Laina Ruth C. Fong Andrea Vedaldi OCL 12 13 0 27 Oct 2020
Learning Dual Semantic Relations with Graph Attention for Image-Text Matching Keyu Wen Xiaodong Gu Qingrong Cheng 6 95 0 22 Oct 2020
Multimodal Research in Vision and Language: A Review of Current and Emerging Trends Shagun Uppal Sarthak Bhagat Devamanyu Hazarika Navonil Majumdar Soujanya Poria Roger Zimmermann Amir Zadeh 18 6 0 19 Oct 2020
New Ideas and Trends in Deep Multimodal Content Understanding: A Review Wei-Neng Chen Weiping Wang Li Liu M. Lew VLM 105 30 0 16 Oct 2020
Positioning yourself in the maze of Neural Text Generation: A Task-Agnostic Survey Khyathi Raghavi Chandu A. Black 10 0 0 14 Oct 2020