Designing Disaggregated Evaluations of AI Systems: Choices, Considerations, and Tradeoffs

10 March 2021

Meredith Ringel Morris

Jennifer Wortman Vaughan

Duncan Wadsworth

Hanna M. Wallach

ArXiv PDF HTML

Papers citing "Designing Disaggregated Evaluations of AI Systems: Choices, Considerations, and Tradeoffs"

41 / 41 papers shown

Title
Phi-4-reasoning Technical Report Marah Abdin Sahaj Agarwal Ahmed Hassan Awadallah Vidhisha Balachandran Harkirat Singh Behl ... Vaishnavi Shrivastava Vibhav Vineet Yue Wu Safoora Yousefi Guoqing Zheng ReLM LRM 84 0 0 30 Apr 2025
SureMap: Simultaneous Mean Estimation for Single-Task and Multi-Task Disaggregated Evaluation M. Khodak Lester W. Mackey Alexandra Chouldechova Miroslav Dudik 34 0 0 14 Nov 2024
Privacy-Preserving Race/Ethnicity Estimation for Algorithmic Bias Measurement in the U.S Saikrishna Badrinarayanan Osonde Osoba Miao Cheng Ryan Rogers Sakshi Jain Rahul Tandra Natesh S. Pillai 16 0 0 06 Sep 2024
Reconsidering Sentence-Level Sign Language Translation Garrett Tanzer Maximus Shengelia Ken Harrenstien David C. Uthus SLR 27 2 0 16 Jun 2024
Analysing and Organising Human Communications for AI Fairness-Related Decisions: Use Cases from the Public Sector Mirthe Dankloff Vanja Skoric Giovanni Sileno S. Ghebreab Jacco van Ossenbruggen Emma Beauxis-Aussalet 19 2 0 20 Mar 2024
(Beyond) Reasonable Doubt: Challenges that Public Defenders Face in Scrutinizing AI in Court Angela Jin Niloufar Salehi ELM 29 2 0 13 Mar 2024
Farsight: Fostering Responsible AI Awareness During AI Application Prototyping Zijie J. Wang Chinmay Kulkarni Lauren Wilcox Michael Terry Michael A. Madaio 38 43 0 23 Feb 2024
A Scoping Study of Evaluation Practices for Responsible AI Tools: Steps Towards Effectiveness Evaluations G. Berman Nitesh Goyal Michael A. Madaio ELM 34 20 0 30 Jan 2024
A structured regression approach for evaluating model performance across intersectional subgroups Christine Herlihy Kimberly Truong Alexandra Chouldechova Miroslav Dudik 36 4 0 26 Jan 2024
Explaining CLIP's performance disparities on data from blind/low vision users Daniela Massiceti Camilla Longden Agnieszka Slowik Samuel Wills Martin Grayson C. Morrison VLM 22 9 0 29 Nov 2023
Cultural Bias and Cultural Alignment of Large Language Models Yan Tao Olga Viberg Ryan S. Baker René F. Kizilcec ELM 21 73 0 23 Nov 2023
AART: AI-Assisted Red-Teaming with Diverse Data Generation for New LLM-powered Applications Bhaktipriya Radharapu Kevin Robinson Lora Aroyo Preethi Lahoti 18 37 0 14 Nov 2023
Scaling Laws Do Not Scale Fernando Diaz Michael A. Madaio 23 8 0 05 Jul 2023
AI Transparency in the Age of LLMs: A Human-Centered Research Roadmap Q. V. Liao J. Vaughan 36 158 0 02 Jun 2023
Centering the Margins: Outlier-Based Identification of Harmed Populations in Toxicity Detection Vyoma Raman Eve Fleisig Dan Klein 19 0 0 24 May 2023
PaLM 2 Technical Report Rohan Anil Andrew M. Dai Orhan Firat Melvin Johnson Dmitry Lepikhin ... Ce Zheng Wei Zhou Denny Zhou Slav Petrov Yonghui Wu ReLM LRM 69 1,142 0 17 May 2023
Consensus and Subjectivity of Skin Tone Annotation for ML Fairness Candice Schumann Gbolahan O. Olanubi Auriel Wright Ellis P. Monk Courtney Heldreth Susanna Ricco 19 21 0 16 May 2023
Racial Bias within Face Recognition: A Survey Seyma Yucer Furkan Tektas Noura Al Moubayed T. Breckon FaML 38 10 0 01 May 2023
Angler: Helping Machine Translation Practitioners Prioritize Model Improvements Samantha Robertson Zijie J. Wang Dominik Moritz Mary Beth Kery Fred Hohman 30 15 0 12 Apr 2023
Fairlearn: Assessing and Improving Fairness of AI Systems Hilde Weerts Miroslav Dudík Richard Edgar Adrin Jalali Roman Lutz Michael Madaio FaML 11 63 0 29 Mar 2023
Can Fairness be Automated? Guidelines and Opportunities for Fairness-aware AutoML Hilde J. P. Weerts Florian Pfisterer Matthias Feurer Katharina Eggensperger Eddie Bergman Noor H. Awad Joaquin Vanschoren Mykola Pechenizkiy B. Bischl Frank Hutter FaML 33 18 0 15 Mar 2023
fAIlureNotes: Supporting Designers in Understanding the Limits of AI Models for Computer Vision Tasks Steven Moore Q. V. Liao Hariharan Subramonyam 13 27 0 22 Feb 2023
Designerly Understanding: Information Needs for Model Transparency to Support Design Ideation for AI-Powered User Experience Q. V. Liao Hariharan Subramonyam Jennifer Wang Jennifer Wortman Vaughan HAI 16 58 0 21 Feb 2023
Towards Reliable Assessments of Demographic Disparities in Multi-Label Image Classifiers Melissa Hall Bobbie Chern Laura Gustafson Denisse Ventura Harshad Kulkarni Candace Ross Nicolas Usunier 25 5 0 16 Feb 2023
Vision-Language Models Performing Zero-Shot Tasks Exhibit Gender-based Disparities Melissa Hall Laura Gustafson Aaron B. Adcock Ishan Misra Candace Ross VLM 32 22 0 26 Jan 2023
Sociotechnical Harms of Algorithmic Systems: Scoping a Taxonomy for Harm Reduction Renee Shelby Shalaleh Rismani Kathryn Henne AJung Moon Negar Rostamzadeh ... N'Mah Yilla-Akbari Jess Gallegos A. Smart Emilio Garcia Gurleen Virk 34 188 0 11 Oct 2022
Weak Proxies are Sufficient and Preferable for Fairness with Missing Sensitive Attributes Zhaowei Zhu Yuanshun Yao Jiankai Sun Hanguang Li Y. Liu 26 21 0 06 Oct 2022
From plane crashes to algorithmic harm: applicability of safety engineering frameworks for responsible ML Shalaleh Rismani Renee Shelby A. Smart Edgar W. Jatho Joshua A. Kroll AJung Moon Negar Rostamzadeh 34 36 0 06 Oct 2022
Matching Consumer Fairness Objectives & Strategies for RecSys Michael D. Ekstrand M. S. Pera FaML 17 3 0 06 Sep 2022
Human-AI Guidelines in Practice: Leaky Abstractions as an Enabler in Collaborative Software Teams Hariharan Subramonyam Jane Im C. Seifert Eytan Adar 17 2 0 04 Jul 2022
Gender Artifacts in Visual Datasets Nicole Meister Dora Zhao Angelina Wang V. V. Ramaswamy Ruth C. Fong Olga Russakovsky 24 28 0 18 Jun 2022
Understanding Machine Learning Practitioners' Data Documentation Perceptions, Needs, Challenges, and Desiderata A. Heger Elizabeth B. Marquis Mihaela Vorvoreanu Hanna M. Wallach J. W. Vaughan 8 60 0 06 Jun 2022
When Personalization Harms: Reconsidering the Use of Group Attributes in Prediction Vinith M. Suriyakumar Marzyeh Ghassemi Berk Ustun 33 6 0 04 Jun 2022
Evaluation Gaps in Machine Learning Practice Ben Hutchinson Negar Rostamzadeh Christina Greer Katherine A. Heller Vinodkumar Prabhakaran ELM 25 56 0 11 May 2022
Towards Intersectionality in Machine Learning: Including More Identities, Handling Underrepresentation, and Performing Evaluation Angelina Wang V. V. Ramaswamy Olga Russakovsky FaML 21 92 0 10 May 2022
Demographic-Reliant Algorithmic Fairness: Characterizing the Risks of Demographic Data Collection in the Pursuit of Fairness Mckane Andrus Sarah Villeneuve FaML 16 50 0 18 Apr 2022
PaLM: Scaling Language Modeling with Pathways Aakanksha Chowdhery Sharan Narang Jacob Devlin Maarten Bosma Gaurav Mishra ... Kathy Meier-Hellstern Douglas Eck J. Dean Slav Petrov Noah Fiedel PILM LRM 83 5,996 0 05 Apr 2022
Assessing the Fairness of AI Systems: AI Practitioners' Processes, Challenges, and Needs for Support Michael A. Madaio Lisa Egede Hariharan Subramonyam Jennifer Wortman Vaughan Hanna M. Wallach 25 141 0 10 Dec 2021
Measuring Hidden Bias within Face Recognition via Racial Phenotypes Seyma Yucer Furkan Tektas Noura Al Moubayed T. Breckon CVBM 19 24 0 19 Oct 2021
FairCanary: Rapid Continuous Explainable Fairness Avijit Ghosh Aalok Shanbhag Christo Wilson 11 20 0 13 Jun 2021
Fair prediction with disparate impact: A study of bias in recidivism prediction instruments Alexandra Chouldechova FaML 201 2,082 0 24 Oct 2016