Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2205.05256
Cited By
Evaluation Gaps in Machine Learning Practice
11 May 2022
Ben Hutchinson
Negar Rostamzadeh
Christina Greer
Katherine A. Heller
Vinodkumar Prabhakaran
ELM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Evaluation Gaps in Machine Learning Practice"
15 / 15 papers shown
Title
Designing Speech Technologies for Australian Aboriginal English: Opportunities, Risks and Participation
Ben Hutchinson
Celeste Rodríguez Louro
Glenys Collard
Ned Cooper
57
0
0
05 Mar 2025
AI Mismatches: Identifying Potential Algorithmic Harms Before AI Development
Devansh Saxena
Ji-Youn Jung
J. Forlizzi
Kenneth Holstein
J. Zimmerman
64
0
0
25 Feb 2025
Can We Trust AI Benchmarks? An Interdisciplinary Review of Current Issues in AI Evaluation
Maria Eriksson
Erasmo Purificato
Arman Noroozian
Joao Vinagre
Guillaume Chaslot
Emilia Gomez
David Fernandez Llorca
ELM
130
1
0
10 Feb 2025
Evaluate & Evaluation on the Hub: Better Best Practices for Data and Model Measurements
Leandro von Werra
Lewis Tunstall
A. Thakur
A. Luccioni
Tristan Thrush
...
Julien Chaumond
Margaret Mitchell
Alexander M. Rush
Thomas Wolf
Douwe Kiela
ELM
21
24
0
30 Sep 2022
Making Intelligence: Ethical Values in IQ and ML Benchmarks
Borhane Blili-Hamelin
Leif Hancox-Li
27
16
0
01 Sep 2022
Mapping global dynamics of benchmark creation and saturation in artificial intelligence
Simon Ott
A. Barbosa-Silva
Kathrin Blagec
J. Brauner
Matthias Samwald
24
36
0
09 Mar 2022
Thinking Beyond Distributions in Testing Machine Learned Models
Negar Rostamzadeh
B. Hutchinson
Christina Greer
Vinodkumar Prabhakaran
TTA
32
6
0
06 Dec 2021
Systematic Inequalities in Language Technology Performance across the World's Languages
Damián E. Blasi
Antonios Anastasopoulos
Graham Neubig
113
131
0
13 Oct 2021
Fairness in Machine Learning
L. Oneto
Silvia Chiappa
FaML
240
488
0
31 Dec 2020
Extracting Training Data from Large Language Models
Nicholas Carlini
Florian Tramèr
Eric Wallace
Matthew Jagielski
Ariel Herbert-Voss
...
Tom B. Brown
D. Song
Ulfar Erlingsson
Alina Oprea
Colin Raffel
MLAU
SILM
281
1,812
0
14 Dec 2020
Can I Trust My Fairness Metric? Assessing Fairness with Unlabeled Data and Bayesian Inference
Disi Ji
Padhraic Smyth
M. Steyvers
34
44
0
19 Oct 2020
Towards Ecologically Valid Research on Language User Interfaces
H. D. Vries
Dzmitry Bahdanau
Christopher D. Manning
204
51
0
28 Jul 2020
Improving fairness in machine learning systems: What do industry practitioners need?
Kenneth Holstein
Jennifer Wortman Vaughan
Hal Daumé
Miroslav Dudík
Hanna M. Wallach
FaML
HAI
192
742
0
13 Dec 2018
GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding
Alex Jinpeng Wang
Amanpreet Singh
Julian Michael
Felix Hill
Omer Levy
Samuel R. Bowman
ELM
297
6,950
0
20 Apr 2018
Fair prediction with disparate impact: A study of bias in recidivism prediction instruments
Alexandra Chouldechova
FaML
207
2,082
0
24 Oct 2016
1