v1v2v3v4v5v6v7v8 (latest)

Datasheets for Datasets

23 March 2018

Timnit Gebru

Jamie Morgenstern

Briana Vecchione

Jennifer Wortman Vaughan

Papers citing "Datasheets for Datasets"

50 / 1,069 papers shown

Title
A Feminist Account of Intersectional Algorithmic Fairness Marie Mirsch Laila Wegner Jonas Strube Carmen Leicht-Scholten FaML 180 0 0 25 Aug 2025
EmoTale: An Enacted Speech-emotion Dataset in Danish Maja J. Hjuler Harald V. Skat-Rørdam Line H. Clemmensen Sneha Das 76 1 0 20 Aug 2025
Assessing Trustworthiness of AI Training Dataset using Subjective Logic -- A Use Case on Bias Koffi Ismael Ouattara Ioannis Krontiris Theo Dimitrakos Frank Kargl 96 3 0 19 Aug 2025
OPTIC-ER: A Reinforcement Learning Framework for Real-Time Emergency Response and Equitable Resource Allocation in Underserved African Communities Mary Tonwe 144 0 0 18 Aug 2025
Documenting Deployment with Fabric: A Repository of Real-World AI Governance Mackenzie Jorgensen Kendall Brogle Katherine M. Collins Lujain Ibrahim Arina Shah ... Paul Dongha Hatim Abdulhussein Adrian Weller Jillian Powers Umang Bhatt 182 0 0 18 Aug 2025
Beyond Internal Data: Bounding and Estimating Fairness from Incomplete Data Varsha Ramineni Hossein A. Rahmani Emine Yilmaz David Barber 112 0 0 18 Aug 2025
Street Review: A Participatory AI-Based Framework for Assessing Streetscape InclusivityCities (Cities), 2025 Rashid Mushkani Shin Koseki 189 7 0 14 Aug 2025
TechOps: Technical Documentation Templates for the AI Act Laura Lucaj Alex Loosley Hakan Jonsson Urs Gasser Patrick van der Smagt 76 1 0 12 Aug 2025
Towards Experience-Centered AI: A Framework for Integrating Lived Experience in Design and Development Sanjana Gautam Mohit Chandra Ankolika De Tatiana Chakravorti Girik Malik M. D. Choudhury 74 0 0 09 Aug 2025
Dynaword: From One-shot to Continuously Developed Datasets Kenneth Enevoldsen Kristian Nørgaard Jensen Jan Kostkan Balázs Szabó Márton Kardos ... Per Møldrup Dalum Desmond Elliott Lukas Galke Peter Schneider-Kamp Kristoffer Nielbo 126 0 0 04 Aug 2025
OVFact: Measuring and Improving Open-Vocabulary Factuality for Long Caption Models Monika Wysoczańska Shyamal Buch Anurag Arnab Cordelia Schmid HILM 168 0 0 25 Jul 2025
Beyond Internal Data: Constructing Complete Datasets for Fairness Testing Varsha Ramineni Hossein A. Rahmani Emine Yilmaz David Barber 126 0 0 24 Jul 2025
CNS-Bench: Benchmarking Image Classifier Robustness Under Continuous Nuisance Shifts Olaf Dünkel Artur Jesslen Jiahao Xie Christian Theobalt Christian Rupprecht Adam Kortylewski DiffM 188 0 0 23 Jul 2025
Characterizing Online Activities Contributing to Suicide Mortality among Youth Aparna Ananthasubramaniam Elyse J. Thulin Viktoryia Kalesnikava Silas Falde Jonathan Kertawidjaja Lily Johns Alejandro Rodríguez-Putnam Emma Spring Kara Zivin Briana Mezuk LRM 71 0 0 22 Jul 2025
Predictive Representativity: Uncovering Racial Bias in AI-based Skin Cancer Detection Andrés Morales-Forero Lili J. Rueda Ronald Herrera Samuel Bassetto Eric Coatanea 62 0 0 10 Jul 2025
No Language Data Left Behind: A Comparative Study of CJK Language Datasets in the Hugging Face Ecosystem Dasol Choi Woomyoung Park Youngsook Song 146 0 0 06 Jul 2025
Measurement as Bricolage: Examining How Data Scientists Construct Target Variables for Predictive Modeling Tasks Luke M. Guerdan Devansh Saxena Stevie Chancellor Zhiwei Steven Wu Kenneth Holstein 187 1 0 03 Jul 2025
A case for data valuation transparency via DValCards Keziah Naggita Julienne LaChance TDI 360 0 0 29 Jun 2025
LAION-C: An Out-of-Distribution Benchmark for Web-Scale Vision Models Fanfei Li Thomas Klein Wieland Brendel Robert Geirhos Roland S. Zimmermann OODD 192 3 0 20 Jun 2025
A Common Pool of Privacy Problems: Legal and Technical Lessons from a Large-Scale Web-Scraped Machine Learning Dataset Rachel Hong Jevan Hutson William Agnew Imaad Huda Tadayoshi Kohno Jamie Morgenstern AILaw 314 3 0 20 Jun 2025
Identifying and Investigating Global News Coverage of Critical Events Such as Disasters and Terrorist AttacksInternational Conference on Web and Social Media (ICWSM), 2025 Erica Cai Xi Chen Reagan Grey Keeney Ethan Zuckerman Brendan O'Connor Przemyslaw A. Grabowicz 122 1 0 15 Jun 2025
IntPhys 2: Benchmarking Intuitive Physics Understanding In Complex Synthetic Environments Florian Bordes Q. Garrido Justine T Kao Adina Williams Michael G. Rabbat Emmanuel Dupoux PINN 196 12 0 11 Jun 2025
Survey on the Evaluation of Generative Models in MusicACM Computing Surveys (ACM Comput. Surv.), 2025 Alexander Lerch Claire Arthur Nick Bryan-Kinns Corey Ford Qianyi Sun Ashvala Vinay 588 4 0 05 Jun 2025
Red Teaming AI Policy: A Taxonomy of Avoision and the EU AI ActConference on Fairness, Accountability and Transparency (FAccT), 2025 Rui-Jie Yew Bill Marino Suresh Venkatasubramanian 158 3 0 02 Jun 2025
Datasheets Aren't Enough: DataRubrics for Automated Quality Metrics and Accountability Genta Indra Winata David Anugraha Emmy Liu Alham Fikri Aji Shou-Yi Hung ... Muhammad Farid Adilazuarda En-Shiun Annie Lee Ayu Purwarianti Derry Wijaya Monojit Choudhury 318 2 0 02 Jun 2025
AI Data Development: A Scorecard for the System Card Framework Tadesse K. Bahiru Haileleol Tibebu Ioannis A. Kakadiaris 161 2 0 02 Jun 2025
Developing a Risk Identification Framework for Foundation Model Uses David Piorkowski Michael Hind John T. Richards Jacquelyn Martino 111 1 0 01 Jun 2025
Risks of AI-driven product development and strategies for their mitigation Jan Göpfert J. Weinand Patrick Kuckertz Noah Pflugradt Jochen Linßen 207 1 0 28 May 2025
Machine Learning Models Have a Supply Chain Problem Sarah Meiklejohn Hayden Blauzvern Mihai Maruseac Spencer Schrock Laurent Simon Ilia Shumailov 197 2 0 28 May 2025
Detecting Cultural Differences in News Video Thumbnails via Computational Aesthetics Marvin Limpijankit John Kender 208 0 0 28 May 2025
MObyGaze: a film dataset of multimodal objectification densely annotated by experts Julie Tores Elisa Ancarani L. Sassatelli Hui-Yin Wu Clement Bergman ... F. Precioso Thierry Devars Magali Guaresi Virginie Julliard Sarah Lecossais DiffM VGen 150 1 0 28 May 2025
Can we Debias Social Stereotypes in AI-Generated Images? Examining Text-to-Image Outputs and User Perceptions Saharsh Barve Andy Mao Jiayue Melissa Shi Prerna Juneja Koustuv Saha 213 0 0 27 May 2025
We Need to Measure Data Diversity in NLP -- Better and Broader Dong Nguyen Esther Ploeger 235 1 0 26 May 2025
MOLE: Metadata Extraction and Validation in Scientific Papers Using LLMs Zaid Alyafeai Maged S. Al-Shaibani Bernard Ghanem 281 4 0 26 May 2025
Fairness-in-the-Workflow: How Machine Learning Practitioners at Big Tech Companies Approach Fairness in Recommender Systems Jing Nathan Yan Emma Harvey Junxiong Wang Jeffrey M. Rzeszotarski Allison Koenecke FaML 290 0 0 26 May 2025
TEDI: Trustworthy and Ethical Dataset Indicators to Analyze and Compare Dataset Documentation Wiebke Hutiri Mircea Cimpoi M. Scheuerman Victoria Matthews Alice Xiang 305 0 0 23 May 2025
Multi-agent Systems for Misinformation Lifecycle : Detection, Correction And Source Identification Aditya Gautam LLMAG 171 1 0 23 May 2025
Optimizing Image Capture for Computer Vision-Powered Taxonomic Identification and Trait Recognition of Biodiversity SpecimensMethods in Ecology and Evolution (MEE), 2025 Alyson East Elizabeth G. Campolongo Luke Meyers S M Rayeed Samuel Stevens ... Hilmar Lapp Paula M. Mabee Graham W. Taylor Graham W. Taylor Sydne Record 160 4 0 22 May 2025
Social Bias in Popular Question-Answering Benchmarks Angelie Kraft Judith Simon Sonja Schimmler 366 3 0 21 May 2025
Fast, Not Fancy: Rethinking G2P with Rich Data and Rule-Based Models Mahta Fetrat Qharabagh Zahra Dehghanian Hamid R. Rabiee 135 1 0 19 May 2025
Towards SFW sampling for diffusion models via external conditioning Camilo Carvajal Reyes J. Fontbona Felipe A. Tobar DiffM 257 1 0 12 May 2025
UKElectionNarratives: A Dataset of Misleading Narratives Surrounding Recent UK General ElectionsInternational Conference on Web and Social Media (ICWSM), 2025 Fatima Haouari Carolina Scarton Nicolò Faggiani Nikolaos Nikolaidis Bonka Kotseva Ibrahim Abu Farha Jens Linge Kalina Bontcheva 261 0 0 08 May 2025
Data Therapist: Eliciting Domain Knowledge from Subject Matter Experts Using Large Language Models Sungbok Shin Hyeon Jeon Sanghyun Hong Niklas Elmqvist 1.2K 0 0 01 May 2025
Clustering Internet Memes Through Template Matching and Multi-Dimensional SimilarityInternational Conference on Web and Social Media (ICWSM), 2025 Tygo Bloem Filip Ilievski 268 1 0 30 Apr 2025
TF1-EN-3M: Three Million Synthetic Moral Fables for Training Small, Open Language Models Mihai Nadas Laura Diosan Andrei Piscoran Andreea Tomescu VGen 320 1 0 29 Apr 2025
Pneuma: Leveraging LLMs for Tabular Data Representation and Retrieval in an End-to-End System Muhammad Imam Luthfi Balaka David Alexander Qian Wang Yue Gong Adila Krisnadhi Raul Castro Fernandez LMTD RALM 188 10 0 12 Apr 2025
Perils of Label Indeterminacy: A Case Study on Prediction of Neurological Recovery After Cardiac ArrestConference on Fairness, Accountability and Transparency (FAccT), 2025 Jakob Schoeffer Maria De-Arteaga Jonathan Elmer 923 2 0 05 Apr 2025
Zero-shot Benchmarking: A Framework for Flexible and Scalable Automatic Evaluation of Language Models José P. Pombal Nuno M. Guerreiro Ricardo Rei André F. T. Martins ALM 544 7 0 01 Apr 2025
XLRS-Bench: Could Your Multimodal LLMs Understand Extremely Large Ultra-High-Resolution Remote Sensing Imagery?Computer Vision and Pattern Recognition (CVPR), 2025 Fengxiang Wang Hongru Wang Mingshuo Chen Haiyan Zhao Yulin Wang ... L. Lan Wenjing Yang Jing Zhang Zhiyuan Liu Maosong Sun 296 23 0 31 Mar 2025
Are clinicians ethically obligated to disclose their use of medical machine learning systems to patients?Journal of Medical Ethics (JME), 2024 Joshua Hatherley 254 3 0 31 Mar 2025