The Eval4NLP 2023 Shared Task on Prompting Large Language Models as Explainable Metrics

30 October 2023

Papers citing "The Eval4NLP 2023 Shared Task on Prompting Large Language Models as Explainable Metrics"

8 / 8 papers shown

Title
Summarization Metrics for Spanish and Basque: Do Automatic Scores and LLM-Judges Correlate with Humans? Jeremy Barnes Naiara Perez Alba Bonet-Jover Begoña Altuna 52 1 0 21 Mar 2025
How Good Are LLMs for Literary Translation, Really? Literary Translation Evaluation with Humans and LLMs Ran Zhang Wei-Ye Zhao Steffen Eger 65 4 0 24 Oct 2024
Natural Language Processing RELIES on Linguistics Juri Opitz Shira Wein Nathan Schneider AI4CE 42 7 0 09 May 2024
Can Large Language Models Be an Alternative to Human Evaluations? Cheng-Han Chiang Hung-yi Lee ALM LM&MA 206 559 0 03 May 2023
Layer or Representation Space: What makes BERT-based Evaluation Metrics Robust? Doan Nam Long Vu N. Moosavi Steffen Eger 6 9 0 06 Sep 2022
Self-Consistency Improves Chain of Thought Reasoning in Language Models Xuezhi Wang Jason W. Wei Dale Schuurmans Quoc Le Ed H. Chi Sharan Narang Aakanksha Chowdhery Denny Zhou ReLM BDL LRM AI4CE 297 3,163 0 21 Mar 2022
Training language models to follow instructions with human feedback Long Ouyang Jeff Wu Xu Jiang Diogo Almeida Carroll L. Wainwright ... Amanda Askell Peter Welinder Paul Christiano Jan Leike Ryan J. Lowe OSLM ALM 301 11,730 0 04 Mar 2022
Codabench: Flexible, Easy-to-Use and Reproducible Benchmarking Platform Zhen Xu Sergio Escalera Isabelle M Guyon Adrien Pavao M. Richard Wei-Wei Tu Quanming Yao Huan Zhao 88 49 0 12 Oct 2021