Data Cards: Purposeful and Transparent Dataset Documentation for Responsible AI

3 April 2022

Papers citing "Data Cards: Purposeful and Transparent Dataset Documentation for Responsible AI"

50 / 110 papers shown

Title
Automatic Histograms: Leveraging Language Models for Text Dataset Exploration Emily Reif Crystal Qian James Wexler Minsuk Kahng 33 10 0 21 Feb 2024
The METRIC-framework for assessing data quality for trustworthy AI in medicine: a systematic review Daniel Schwabe Katinka Becker Martin Seyferth Andreas Klass Tobias Schäffter 29 20 0 21 Feb 2024
Sketching AI Concepts with Capabilities and Examples: AI Innovation in the Intensive Care Unit Nur Yildirim Susanna Zlotnikov Deniz Sayar Jeremy M. Kahn L. Bukowski ... Venkatesh Sivaraman Adam Perer Sarah Preum James McCann John Zimmerman 31 14 0 21 Feb 2024
DataDreamer: A Tool for Synthetic Data Generation and Reproducible LLM Workflows Ajay Patel Colin Raffel Chris Callison-Burch SyDa AI4CE 25 25 0 16 Feb 2024
Aya Dataset: An Open-Access Collection for Multilingual Instruction Tuning Shivalika Singh Freddie Vargus Daniel D'souza Börje F. Karlsson Abinaya Mahendiran ... Max Bartolo Julia Kreutzer A. Ustun Marzieh Fadaee Sara Hooker 117 115 0 09 Feb 2024
Copycats: the many lives of a publicly available medical imaging dataset Amelia Jiménez-Sánchez Natalia-Rozalia Avlona Dovile Juodelyte Théo Sourget Caroline Vang-Larsen Anna Rogers Hubert Dariusz Zajkac V. Cheplygina 27 0 0 09 Feb 2024
What's documented in AI? Systematic Analysis of 32K AI Model Cards Weixin Liang Nazneen Rajani Xinyu Yang Ezinwanne Ozoani Eric Wu Yiqun Chen D. Smith James Y. Zou 33 15 0 07 Feb 2024
A Scoping Study of Evaluation Practices for Responsible AI Tools: Steps Towards Effectiveness Evaluations G. Berman Nitesh Goyal Michael A. Madaio ELM 34 20 0 30 Jan 2024
Navigating Dataset Documentations in AI: A Large-Scale Analysis of Dataset Cards on Hugging Face Xinyu Yang Weixin Liang James Y. Zou CVBM 24 16 0 24 Jan 2024
Towards Conversational Diagnostic AI Tao Tu Anil Palepu M. Schaekermann Khaled Saab Jan Freyberg ... Katherine Chou Greg S. Corrado Yossi Matias Alan Karthikesalingam Vivek Natarajan AI4MH LM&MA 26 92 0 11 Jan 2024
Open Datasheets: Machine-readable Documentation for Open Datasets and Responsible AI Assessments Anthony C. Roman Jennifer Wortman Vaughan Valerie See Steph Ballard Jehu Torres Vega Caleb Robinson J. L. Ferres 22 4 0 11 Dec 2023
SoUnD Framework: Analyzing (So)cial Representation in (Un)structured (D)ata Mark Díaz Sunipa Dev Emily Reif Remi Denton Vinodkumar Prabhakaran 33 3 0 28 Nov 2023
Leveraging LLMs for Synthesizing Training Data Across Many Languages in Multilingual Dense Retrieval Nandan Thakur Jianmo Ni Gustavo Hernández Ábrego John Wieting Jimmy J. Lin Daniel Matthew Cer RALM 29 12 0 10 Nov 2023
Is a Seat at the Table Enough? Engaging Teachers and Students in Dataset Specification for ML in Education Mei Tan Hansol Lee Dakuo Wang Hariharan Subramonyam 21 7 0 09 Nov 2023
Principles from Clinical Research for NLP Model Generalization Aparna Elangovan Jiayuan He Yuan Li Karin Verspoor CML 22 3 0 07 Nov 2023
The Data Provenance Initiative: A Large Scale Audit of Dataset Licensing & Attribution in AI Shayne Longpre Robert Mahari Anthony Chen Naana Obeng-Marnu Damien Sileo ... K. Bollacker Tongshuang Wu Luis Villa Sandy Pentland Sara Hooker 15 55 0 25 Oct 2023
A State-Vector Framework for Dataset Effects E. Sahak Zining Zhu Frank Rudzicz 20 1 0 17 Oct 2023
Path To Gain Functional Transparency In Artificial Intelligence With Meaningful Explainability Md. Tanzib Hosain Md. Mehedi Hasan Anik Sadman Rafi̇ Rana Tabassum Khaleque Insi̇a Md. Mehrab Siddiky 13 6 0 13 Oct 2023
How to Data in Datathons Carlos Mougan Richard Plant Clare Teng Marya Bazzi Alvaro Cabregas-Ejea Ryan Sze-Yin Chan David Salvador Jasin Martin Stoffel K. Whitaker Jules Manser 20 1 0 18 Sep 2023
FACET: Fairness in Computer Vision Evaluation Benchmark Laura Gustafson Chloe Rolland Nikhila Ravi Quentin Duval Aaron B. Adcock Cheng-Yang Fu Melissa Hall Candace Ross VLM EGVM 16 36 0 31 Aug 2023
Collect, Measure, Repeat: Reliability Factors for Responsible AI Data Collection Oana Inel Tim Draws Lora Aroyo 28 6 0 22 Aug 2023
EgoSchema: A Diagnostic Benchmark for Very Long-form Video Language Understanding K. Mangalam Raiymbek Akshulakov Jitendra Malik 25 245 0 17 Aug 2023
Visualising category recoding and numeric redistributions Cynthia A. Huang 13 1 0 12 Aug 2023
No Fair Lunch: A Causal Perspective on Dataset Bias in Machine Learning for Medical Imaging Charles Jones Daniel Coelho De Castro Fabio De Sousa Ribeiro Ozan Oktay Melissa McCradden Ben Glocker FaML CML 30 9 0 31 Jul 2023
ARB: Advanced Reasoning Benchmark for Large Language Models Tomohiro Sawada Daniel Paleka Alexander Havrilla Pranav Tadepalli Paula Vidas Alexander Kranias John J. Nay Kshitij Gupta Aran Komatsuzaki ELM LRM 29 37 0 25 Jul 2023
Model Reporting for Certifiable AI: A Proposal from Merging EU Regulation into AI Development Danilo Brajovic Niclas Renner Vincent Philipp Goebels Philipp Wagner Benjamin Frész M. Biller Mara Klaeb Janika Kutz Jens Neuhuettler Marco F. Huber 19 8 0 21 Jul 2023
Analyzing Dataset Annotation Quality Management in the Wild Jan-Christoph Klie Richard Eckart de Castilho Iryna Gurevych 10 17 0 16 Jul 2023
SugarCrepe: Fixing Hackable Benchmarks for Vision-Language Compositionality Cheng-Yu Hsieh Jieyu Zhang Zixian Ma Aniruddha Kembhavi Ranjay Krishna CoGe 38 115 0 26 Jun 2023
Use case cards: a use case reporting framework inspired by the European AI Act Isabelle Hupont David Fernández Llorca S. Baldassarri Emilia Gómez 19 18 0 23 Jun 2023
Towards Regulatable AI Systems: Technical Gaps and Policy Opportunities Xudong Shen H. Brown Jiashu Tao Martin Strobel Yao Tong Akshay Narayan Harold Soh Finale Doshi-Velez 27 3 0 22 Jun 2023
Reproducibility in NLP: What Have We Learned from the Checklist? Ian H. Magnusson Noah A. Smith Jesse Dodge 12 11 0 16 Jun 2023
On Bias and Fairness in NLP: Investigating the Impact of Bias and Debiasing in Language Models on the Fairness of Toxicity Detection Fatma Elsafoury Stamos Katsigiannis 30 1 0 22 May 2023
SeeGULL: A Stereotype Benchmark with Broad Geo-Cultural Coverage Leveraging Generative Models Akshita Jha Aida Mostafazadeh Davani Chandan K. Reddy Shachi Dave Vinodkumar Prabhakaran Sunipa Dev 23 40 0 19 May 2023
PaLM 2 Technical Report Rohan Anil Andrew M. Dai Orhan Firat Melvin Johnson Dmitry Lepikhin ... Ce Zheng Wei Zhou Denny Zhou Slav Petrov Yonghui Wu ReLM LRM 62 1,142 0 17 May 2023
Consensus and Subjectivity of Skin Tone Annotation for ML Fairness Candice Schumann Gbolahan O. Olanubi Auriel Wright Ellis P. Monk Courtney Heldreth Susanna Ricco 17 21 0 16 May 2023
Understanding accountability in algorithmic supply chains Jennifer Cobbe Michael Veale Jatinder Singh 50 60 0 28 Apr 2023
The Dataset Multiplicity Problem: How Unreliable Data Impacts Predictions Anna P. Meyer Aws Albarghouthi Loris Dántoni 22 13 0 20 Apr 2023
Right the docs: Characterising voice dataset documentation practices used in machine learning Kathy Reid Elizabeth T. Williams 12 2 0 19 Mar 2023
Auditing large language models: a three-layered approach Jakob Mokander Jonas Schuett Hannah Rose Kirk Luciano Floridi AILaw MLAU 34 194 0 16 Feb 2023
A Systematic Literature Review of Human-Centered, Ethical, and Responsible AI Mohammad Tahaei Marios Constantinides Daniele Quercia Michael J. Muller AI4TS 40 8 0 10 Feb 2023
Ethical Considerations for Responsible Data Curation Jerone T. A. Andrews Dora Zhao William Thong Apostolos Modas Orestis Papakyriakopoulos Alice Xiang 17 19 0 07 Feb 2023
Charting the Sociotechnical Gap in Explainable AI: A Framework to Address the Gap in XAI Upol Ehsan Koustuv Saha M. D. Choudhury Mark O. Riedl 18 57 0 01 Feb 2023
Investigating How Practitioners Use Human-AI Guidelines: A Case Study on the People + AI Guidebook Nur Yildirim Mahima Pushkarna Nitesh Goyal Martin Wattenberg Fernanda Viégas 34 66 0 28 Jan 2023
Mephisto: A Framework for Portable, Reproducible, and Iterative Crowdsourcing Jack Urbanek Pratik Ringshia FedML 6 7 0 12 Jan 2023
Manifestations of Xenophobia in AI Systems Nenad Tomašev J. L. Maynard Iason Gabriel 24 9 0 15 Dec 2022
A Brief Overview of AI Governance for Responsible Machine Learning Systems Navdeep Gill Abhishek Mathur Marcos V. Conde 19 5 0 21 Nov 2022
Understanding Text Classification Data and Models Using Aggregated Input Salience Sebastian Ebert Alice Shoshana Jakobovits Katja Filippova FAtt 17 3 0 10 Nov 2022
Scaling Instruction-Finetuned Language Models Hyung Won Chung Le Hou Shayne Longpre Barret Zoph Yi Tay ... Jacob Devlin Adam Roberts Denny Zhou Quoc V. Le Jason W. Wei ReLM LRM 56 2,986 0 20 Oct 2022
Documenting use cases in the affective computing domain using Unified Modeling Language Isabelle Hupont Emilia Gómez 15 3 0 19 Sep 2022
PaLI: A Jointly-Scaled Multilingual Language-Image Model Xi Chen Xiao Wang Soravit Changpinyo A. Piergiovanni Piotr Padlewski ... Andreas Steiner A. Angelova Xiaohua Zhai N. Houlsby Radu Soricut MLLM VLM 26 682 0 14 Sep 2022