Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2204.11574
Cited By
A global analysis of metrics used for measuring performance in natural language processing
25 April 2022
Kathrin Blagec
Georg Dorffner
M. Moradi
Simon Ott
Matthias Samwald
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"A global analysis of metrics used for measuring performance in natural language processing"
11 / 11 papers shown
Title
Robot Learning as an Empirical Science: Best Practices for Policy Evaluation
H. Kress-Gazit
Kunimatsu Hashimoto
Naveen Kuppuswamy
Paarth Shah
Phoebe Horgan
Gordon Richardson
Siyuan Feng
Benjamin Burchfiel
73
5
0
14 Sep 2024
Quantifying Variance in Evaluation Benchmarks
Lovish Madaan
Aaditya K. Singh
Rylan Schaeffer
Andrew Poulton
Sanmi Koyejo
Pontus Stenetorp
Sharan Narang
Dieuwke Hupkes
106
15
0
14 Jun 2024
Enhancing Clinical Documentation with Synthetic Data: Leveraging Generative Models for Improved Accuracy
Anjanava Biswas
Wrick Talukdar
SyDa
106
6
0
03 Jun 2024
Hybrid LLM: Cost-Efficient and Quality-Aware Query Routing
Dujian Ding
Ankur Mallick
Chi Wang
Robert Sim
Subhabrata Mukherjee
Victor Rühle
L. Lakshmanan
Ahmed Hassan Awadallah
178
107
0
22 Apr 2024
Unveiling LLM Evaluation Focused on Metrics: Challenges and Solutions
Taojun Hu
Xiao-Hua Zhou
ELM
91
18
0
14 Apr 2024
Prompt Risk Control: A Rigorous Framework for Responsible Deployment of Large Language Models
Thomas P. Zollo
Todd Morrill
Zhun Deng
Jake C. Snell
T. Pitassi
Richard Zemel
104
9
0
22 Nov 2023
Which Prompts Make The Difference? Data Prioritization For Efficient Human LLM Evaluation
M. Boubdir
Edward Kim
Beyza Ermis
Marzieh Fadaee
Sara Hooker
ALM
90
19
0
22 Oct 2023
Foundation Metrics for Evaluating Effectiveness of Healthcare Conversations Powered by Generative AI
Mahyar Abbasian
Elahe Khatibi
Iman Azimi
David Oniani
Zahra Shakeri Hossein Abad
...
Bryant Lin
Olivier Gevaert
Li-Jia Li
Ramesh C. Jain
Amir M. Rahmani
LM&MA
ELM
AI4MH
145
78
0
21 Sep 2023
Evaluating Machine Translation Quality with Conformal Predictive Distributions
Patrizio Giovannotti
UQLM
105
7
0
02 Jun 2023
A comprehensive review of automatic text summarization techniques: method, data, evaluation and coding
D. Cajueiro
A. G. Nery
Igor Tavares
Maísa Kely de Melo
Silvia A. dos Reis
Weigang Li
V. R. R. Celestino
88
15
0
04 Jan 2023
Mapping global dynamics of benchmark creation and saturation in artificial intelligence
Simon Ott
A. Barbosa-Silva
Kathrin Blagec
J. Brauner
Matthias Samwald
113
40
0
09 Mar 2022
1