Evaluation Gaps in Machine Learning Practice

Evaluation Gaps in Machine Learning Practice

11 May 2022

Negar Rostamzadeh

Christina Greer

Katherine A. Heller

Vinodkumar Prabhakaran

Papers citing "Evaluation Gaps in Machine Learning Practice"

15 / 15 papers shown

Title
Designing Speech Technologies for Australian Aboriginal English: Opportunities, Risks and Participation Ben Hutchinson Celeste Rodríguez Louro Glenys Collard Ned Cooper 57 0 0 05 Mar 2025
AI Mismatches: Identifying Potential Algorithmic Harms Before AI Development Devansh Saxena Ji-Youn Jung J. Forlizzi Kenneth Holstein J. Zimmerman 64 0 0 25 Feb 2025
Can We Trust AI Benchmarks? An Interdisciplinary Review of Current Issues in AI Evaluation Maria Eriksson Erasmo Purificato Arman Noroozian Joao Vinagre Guillaume Chaslot Emilia Gomez David Fernandez Llorca ELM 130 1 0 10 Feb 2025
Evaluate & Evaluation on the Hub: Better Best Practices for Data and Model Measurements Leandro von Werra Lewis Tunstall A. Thakur A. Luccioni Tristan Thrush ... Julien Chaumond Margaret Mitchell Alexander M. Rush Thomas Wolf Douwe Kiela ELM 21 24 0 30 Sep 2022
Making Intelligence: Ethical Values in IQ and ML Benchmarks Borhane Blili-Hamelin Leif Hancox-Li 27 16 0 01 Sep 2022
Mapping global dynamics of benchmark creation and saturation in artificial intelligence Simon Ott A. Barbosa-Silva Kathrin Blagec J. Brauner Matthias Samwald 24 36 0 09 Mar 2022
Thinking Beyond Distributions in Testing Machine Learned Models Negar Rostamzadeh B. Hutchinson Christina Greer Vinodkumar Prabhakaran TTA 32 6 0 06 Dec 2021
Systematic Inequalities in Language Technology Performance across the World's Languages Damián E. Blasi Antonios Anastasopoulos Graham Neubig 113 131 0 13 Oct 2021
Fairness in Machine Learning L. Oneto Silvia Chiappa FaML 240 488 0 31 Dec 2020
Extracting Training Data from Large Language Models Nicholas Carlini Florian Tramèr Eric Wallace Matthew Jagielski Ariel Herbert-Voss ... Tom B. Brown D. Song Ulfar Erlingsson Alina Oprea Colin Raffel MLAU SILM 281 1,812 0 14 Dec 2020
Can I Trust My Fairness Metric? Assessing Fairness with Unlabeled Data and Bayesian Inference Disi Ji Padhraic Smyth M. Steyvers 34 44 0 19 Oct 2020
Towards Ecologically Valid Research on Language User Interfaces H. D. Vries Dzmitry Bahdanau Christopher D. Manning 204 51 0 28 Jul 2020
Improving fairness in machine learning systems: What do industry practitioners need? Kenneth Holstein Jennifer Wortman Vaughan Hal Daumé Miroslav Dudík Hanna M. Wallach FaML HAI 192 742 0 13 Dec 2018
GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding Alex Jinpeng Wang Amanpreet Singh Julian Michael Felix Hill Omer Levy Samuel R. Bowman ELM 297 6,950 0 20 Apr 2018
Fair prediction with disparate impact: A study of bias in recidivism prediction instruments Alexandra Chouldechova FaML 207 2,082 0 24 Oct 2016