ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2002.08512
  4. Cited By
The Problem with Metrics is a Fundamental Problem for AI

The Problem with Metrics is a Fundamental Problem for AI

20 February 2020
Rachel L. Thomas
D. Uminsky
ArXiv (abs)PDFHTML

Papers citing "The Problem with Metrics is a Fundamental Problem for AI"

24 / 24 papers shown
Branching Out: Broadening AI Measurement and Evaluation with Measurement Trees
Branching Out: Broadening AI Measurement and Evaluation with Measurement Trees
Craig Greenberg
Patrick Hall
Theodore Jensen
Kristen Greene
Razvan Amironesei
121
0
0
30 Sep 2025
The Inadequacy of Offline LLM Evaluations: A Need to Account for Personalization in Model Behavior
The Inadequacy of Offline LLM Evaluations: A Need to Account for Personalization in Model Behavior
Angelina Wang
Mark A. Lemley
Sanmi Koyejo
OffRL
241
6
0
18 Sep 2025
An Anthropologist LLM to Elicit Users' Moral Preferences through Role-Play
An Anthropologist LLM to Elicit Users' Moral Preferences through Role-Play
Gianluca De Ninno
Paola Inverardi
Francesca Belotti
114
1
0
20 Aug 2025
Reality Check: A New Evaluation Ecosystem Is Necessary to Understand AI's Real World Effects
Reality Check: A New Evaluation Ecosystem Is Necessary to Understand AI's Real World Effects
Reva Schwartz
Rumman Chowdhury
Akash Kundu
Heather Frase
Marzieh Fadaee
...
Andrew Thompson
Maya Carlyle
Qinghua Lu
Matthew Holmes
Theodora Skeadas
472
13
0
24 May 2025
Beyond Accuracy: EcoL2 Metric for Sustainable Neural PDE Solvers
Beyond Accuracy: EcoL2 Metric for Sustainable Neural PDE Solvers
Taniya Kapoor
Abhishek Chandra
Anastasios Stamou
Stephen J Roberts
343
2
0
18 May 2025
Beware of "Explanations" of AI
Beware of "Explanations" of AI
David Martens
Galit Shmueli
Theodoros Evgeniou
Kevin Bauer
Christian Janiesch
...
Claudia Perlich
Wouter Verbeke
Alona Zharova
Patrick Zschech
F. Provost
429
5
0
09 Apr 2025
Predictable Artificial Intelligence
Predictable Artificial Intelligence
Lexin Zhou
Pablo Antonio Moreno Casares
Fernando Martínez-Plumed
John Burden
Ryan Burnell
...
Seán Ó hÉigeartaigh
Danaja Rutar
Wout Schellaert
Konstantinos Voudouris
José Hernández-Orallo
706
9
0
08 Jan 2025
GPT for Games: An Updated Scoping Review (2020-2024)
GPT for Games: An Updated Scoping Review (2020-2024)IEEE Transactions on Games (IEEE Trans. Games), 2024
Daijin Yang
Erica Kleinman
Casper Harteveld
LLMAGAI4TSAI4CE
626
17
0
01 Nov 2024
Benchmark Data Repositories for Better Benchmarking
Benchmark Data Repositories for Better BenchmarkingNeural Information Processing Systems (NeurIPS), 2024
Rachel Longjohn
Markelle Kelly
Sameer Singh
Padhraic Smyth
305
15
0
31 Oct 2024
"This is not a data problem": Algorithms and Power in Public Higher
  Education in Canada
"This is not a data problem": Algorithms and Power in Public Higher Education in Canada
Kelly McConvey
Shion Guha
327
15
0
20 Mar 2024
Promises and pitfalls of artificial intelligence for legal applications
Promises and pitfalls of artificial intelligence for legal applicationsSocial Science Research Network (SSRN), 2024
Sayash Kapoor
Peter Henderson
Arvind Narayanan
ELM
185
29
0
10 Jan 2024
A Review of the Evidence for Existential Risk from AI via Misaligned
  Power-Seeking
A Review of the Evidence for Existential Risk from AI via Misaligned Power-Seeking
Rose Hadshar
203
14
0
27 Oct 2023
Large language models can accurately predict searcher preferences
Large language models can accurately predict searcher preferencesAnnual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR), 2023
Paul Thomas
S. Spielman
Nick Craswell
Bhaskar Mitra
ALMLRM
471
246
0
19 Sep 2023
Scaling Laws Do Not Scale
Scaling Laws Do Not ScaleAAAI/ACM Conference on AI, Ethics, and Society (AIES), 2023
Fernando Diaz
Michael A. Madaio
390
19
0
05 Jul 2023
Mapping the Challenges of HCI: An Application and Evaluation of ChatGPT for Mining Insights at Scale
Mapping the Challenges of HCI: An Application and Evaluation of ChatGPT for Mining Insights at Scale
Jonas Oppenlaender
Joonas Hamalainen
ALMELM
489
8
0
08 Jun 2023
Positive AI: Key Challenges in Designing Artificial Intelligence for
  Wellbeing
Positive AI: Key Challenges in Designing Artificial Intelligence for Wellbeing
Willem van der Maden
Derek Lomas
Malak Sadek
P. Hekkert
393
6
0
12 Apr 2023
Aligning Offline Metrics and Human Judgments of Value for Code
  Generation Models
Aligning Offline Metrics and Human Judgments of Value for Code Generation ModelsAnnual Meeting of the Association for Computational Linguistics (ACL), 2022
Victor C. Dibia
Adam Fourney
Gagan Bansal
Forough Poursabzi-Sangdeh
Han Liu
Saleema Amershi
ALMOffRL
272
24
0
29 Oct 2022
Challenges in Explanation Quality Evaluation
Challenges in Explanation Quality Evaluation
Hendrik Schuff
Heike Adel
Peng Qi
Ngoc Thang Vu
XAI
312
4
0
13 Oct 2022
Defining and Characterizing Reward Hacking
Defining and Characterizing Reward Hacking
Joar Skalse
Nikolaus H. R. Howe
Dmitrii Krasheninnikov
David M. Krueger
498
113
0
27 Sep 2022
Identifying the Context Shift between Test Benchmarks and Production
  Data
Identifying the Context Shift between Test Benchmarks and Production Data
Matthew Groh
OOD
256
8
0
03 Jul 2022
Eliciting and Learning with Soft Labels from Every Annotator
Eliciting and Learning with Soft Labels from Every AnnotatorAAAI Conference on Human Computation & Crowdsourcing (HCOMP), 2022
Katherine M. Collins
Umang Bhatt
Adrian Weller
510
57
0
02 Jul 2022
The Different Faces of AI Ethics Across the World: A
  Principle-Implementation Gap Analysis
The Different Faces of AI Ethics Across the World: A Principle-Implementation Gap Analysis
L. Tidjon
Foutse Khomh
174
13
0
12 May 2022
Evaluation Gaps in Machine Learning Practice
Evaluation Gaps in Machine Learning PracticeConference on Fairness, Accountability and Transparency (FAccT), 2022
Ben Hutchinson
Negar Rostamzadeh
Christina Greer
Katherine A. Heller
Vinodkumar Prabhakaran
ELM
418
82
0
11 May 2022
What are you optimizing for? Aligning Recommender Systems with Human
  Values
What are you optimizing for? Aligning Recommender Systems with Human Values
J. Stray
Ivan Vendrov
Jeremy Nixon
Steven Adler
Dylan Hadfield-Menell
OffRL
216
64
0
22 Jul 2021
1
Page 1 of 1