The Problem with Metrics is a Fundamental Problem for AI

20 February 2020

Rachel L. Thomas

D. Uminsky

ArXiv (abs)PDF HTML

Papers citing "The Problem with Metrics is a Fundamental Problem for AI"

24 / 24 papers shown

Branching Out: Broadening AI Measurement and Evaluation with Measurement Trees

121

30 Sep 2025

The Inadequacy of Offline LLM Evaluations: A Need to Account for Personalization in Model Behavior

241

18 Sep 2025

An Anthropologist LLM to Elicit Users' Moral Preferences through Role-Play

Gianluca De Ninno

Paola Inverardi

Francesca Belotti

114

20 Aug 2025

Reality Check: A New Evaluation Ecosystem Is Necessary to Understand AI's Real World Effects

...

472

24 May 2025

Beyond Accuracy: EcoL2 Metric for Sustainable Neural PDE Solvers

343

18 May 2025

Beware of "Explanations" of AI

...

429

09 Apr 2025

Predictable Artificial Intelligence

Lexin Zhou

Pablo Antonio Moreno Casares

Fernando Martínez-Plumed

...

Konstantinos Voudouris

José Hernández-Orallo

706

08 Jan 2025

GPT for Games: An Updated Scoping Review (2020-2024)IEEE Transactions on Games (IEEE Trans. Games), 2024

626

01 Nov 2024

Benchmark Data Repositories for Better BenchmarkingNeural Information Processing Systems (NeurIPS), 2024

305

31 Oct 2024

"This is not a data problem": Algorithms and Power in Public Higher Education in Canada

Kelly McConvey

Shion Guha

327

20 Mar 2024

Promises and pitfalls of artificial intelligence for legal applicationsSocial Science Research Network (SSRN), 2024

185

10 Jan 2024

A Review of the Evidence for Existential Risk from AI via Misaligned Power-Seeking

Rose Hadshar

203

27 Oct 2023

Large language models can accurately predict searcher preferencesAnnual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR), 2023

471

246

19 Sep 2023

Scaling Laws Do Not ScaleAAAI/ACM Conference on AI, Ethics, and Society (AIES), 2023

Fernando Diaz

Michael A. Madaio

390

05 Jul 2023

Mapping the Challenges of HCI: An Application and Evaluation of ChatGPT for Mining Insights at Scale

Jonas Oppenlaender

Joonas Hamalainen

ALM ELM

489

08 Jun 2023

Positive AI: Key Challenges in Designing Artificial Intelligence for Wellbeing

393

12 Apr 2023

Aligning Offline Metrics and Human Judgments of Value for Code Generation ModelsAnnual Meeting of the Association for Computational Linguistics (ACL), 2022

Victor C. Dibia

Adam Fourney

Gagan Bansal

Forough Poursabzi-Sangdeh

Han Liu

Saleema Amershi

ALM OffRL

272

29 Oct 2022

Challenges in Explanation Quality Evaluation

312

13 Oct 2022

Defining and Characterizing Reward Hacking

Joar Skalse

Nikolaus H. R. Howe

Dmitrii Krasheninnikov

David M. Krueger

498

113

27 Sep 2022

Identifying the Context Shift between Test Benchmarks and Production Data

Matthew Groh

OOD

256

03 Jul 2022

Eliciting and Learning with Soft Labels from Every AnnotatorAAAI Conference on Human Computation & Crowdsourcing (HCOMP), 2022

Katherine M. Collins

Umang Bhatt

Adrian Weller

510

02 Jul 2022

The Different Faces of AI Ethics Across the World: A Principle-Implementation Gap Analysis

L. Tidjon

Foutse Khomh

174

12 May 2022

Evaluation Gaps in Machine Learning PracticeConference on Fairness, Accountability and Transparency (FAccT), 2022

Vinodkumar Prabhakaran

ELM

418

11 May 2022

What are you optimizing for? Aligning Recommender Systems with Human Values

Dylan Hadfield-Menell

OffRL

216

22 Jul 2021