ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2305.08566
  4. Cited By
NLG Evaluation Metrics Beyond Correlation Analysis: An Empirical Metric
  Preference Checklist

NLG Evaluation Metrics Beyond Correlation Analysis: An Empirical Metric Preference Checklist

15 May 2023
Iftitahu Ni'mah
Meng Fang
Vlado Menkovski
Mykola Pechenizkiy
ArXivPDFHTML

Papers citing "NLG Evaluation Metrics Beyond Correlation Analysis: An Empirical Metric Preference Checklist"

17 / 17 papers shown
Title
Verify with Caution: The Pitfalls of Relying on Imperfect Factuality Metrics
Verify with Caution: The Pitfalls of Relying on Imperfect Factuality Metrics
Ameya Godbole
Robin Jia
HILM
51
1
0
24 Jan 2025
Socio-Emotional Response Generation: A Human Evaluation Protocol for
  LLM-Based Conversational Systems
Socio-Emotional Response Generation: A Human Evaluation Protocol for LLM-Based Conversational Systems
Lorraine Vanel
Ariel R. Ramos Vela
Alya Yacoubi
Chloé Clavel
65
0
0
26 Nov 2024
Automatic Metrics in Natural Language Generation: A Survey of Current
  Evaluation Practices
Automatic Metrics in Natural Language Generation: A Survey of Current Evaluation Practices
Patrícia Schmidtová
Saad Mahamood
Simone Balloccu
Ondřej Dušek
Albert Gatt
Dimitra Gkatzia
David M. Howcroft
Ondřej Plátek
Adarsa Sivaprasad
43
3
0
17 Aug 2024
A LLM-Based Ranking Method for the Evaluation of Automatic
  Counter-Narrative Generation
A LLM-Based Ranking Method for the Evaluation of Automatic Counter-Narrative Generation
I. Zubiaga
A. Soroa
Rodrigo Agerri
29
4
0
21 Jun 2024
Is Reference Necessary in the Evaluation of NLG Systems? When and Where?
Is Reference Necessary in the Evaluation of NLG Systems? When and Where?
Shuqian Sheng
Yi Xu
Luoyi Fu
Jiaxin Ding
Lei Zhou
Xinbing Wang
Cheng Zhou
21
3
0
21 Mar 2024
VIALM: A Survey and Benchmark of Visually Impaired Assistance with Large
  Models
VIALM: A Survey and Benchmark of Visually Impaired Assistance with Large Models
Yi Zhao
Yilin Zhang
Rong Xiang
Jing Li
Hillming Li
26
16
0
29 Jan 2024
LUNA: A Framework for Language Understanding and Naturalness Assessment
LUNA: A Framework for Language Understanding and Naturalness Assessment
Marat Saidov
A. Bakalova
Ekaterina Taktasheva
Vladislav Mikhailov
Ekaterina Artemova
ELM
17
1
0
09 Jan 2024
Clinfo.ai: An Open-Source Retrieval-Augmented Large Language Model
  System for Answering Medical Questions using Scientific Literature
Clinfo.ai: An Open-Source Retrieval-Augmented Large Language Model System for Answering Medical Questions using Scientific Literature
Alejandro Lozano
Scott L. Fleming
Chia-Chun Chiang
Nigam Shah
ELM
RALM
18
32
0
24 Oct 2023
MedAlign: A Clinician-Generated Dataset for Instruction Following with
  Electronic Medical Records
MedAlign: A Clinician-Generated Dataset for Instruction Following with Electronic Medical Records
Scott L. Fleming
Alejandro Lozano
W. Haberkorn
Jenelle A. Jindal
E. Reis
...
Jonathan H. Chen
Keith Morse
Emma Brunskill
Jason Alan Fries
N. Shah
LM&MA
23
52
0
27 Aug 2023
Evaluating Evaluation Metrics: A Framework for Analyzing NLG Evaluation
  Metrics using Measurement Theory
Evaluating Evaluation Metrics: A Framework for Analyzing NLG Evaluation Metrics using Measurement Theory
Ziang Xiao
Susu Zhang
Vivian Lai
Q. V. Liao
ELM
17
23
0
24 May 2023
Layer or Representation Space: What makes BERT-based Evaluation Metrics
  Robust?
Layer or Representation Space: What makes BERT-based Evaluation Metrics Robust?
Doan Nam Long Vu
N. Moosavi
Steffen Eger
11
9
0
06 Sep 2022
The Authenticity Gap in Human Evaluation
The Authenticity Gap in Human Evaluation
Kawin Ethayarajh
Dan Jurafsky
79
24
0
24 May 2022
Types of Out-of-Distribution Texts and How to Detect Them
Types of Out-of-Distribution Texts and How to Detect Them
Udit Arora
William Huang
He He
OODD
207
97
0
14 Sep 2021
Perturbation CheckLists for Evaluating NLG Evaluation Metrics
Perturbation CheckLists for Evaluating NLG Evaluation Metrics
Ananya B. Sai
Tanay Dixit
D. Y. Sheth
S. Mohan
Mitesh M. Khapra
AAML
97
55
0
13 Sep 2021
Fine-Tuning Language Models from Human Preferences
Fine-Tuning Language Models from Human Preferences
Daniel M. Ziegler
Nisan Stiennon
Jeff Wu
Tom B. Brown
Alec Radford
Dario Amodei
Paul Christiano
G. Irving
ALM
275
1,561
0
18 Sep 2019
Text Summarization with Pretrained Encoders
Text Summarization with Pretrained Encoders
Yang Liu
Mirella Lapata
MILM
254
1,417
0
22 Aug 2019
Efficient Estimation of Word Representations in Vector Space
Efficient Estimation of Word Representations in Vector Space
Tomáš Mikolov
Kai Chen
G. Corrado
J. Dean
3DV
228
31,150
0
16 Jan 2013
1