Data Cards: Purposeful and Transparent Dataset Documentation for Responsible AI

3 April 2022

Papers citing "Data Cards: Purposeful and Transparent Dataset Documentation for Responsible AI"

50 / 109 papers shown

Title
User and Recommender Behavior Over Time: Contextualizing Activity, Effectiveness, Diversity, and Fairness in Book Recommendation Samira Vaez Barenji Sushobhan Parajuli Michael D. Ekstrand MLAU 47 0 0 07 May 2025
Perils of Label Indeterminacy: A Case Study on Prediction of Neurological Recovery After Cardiac Arrest Jakob Schoeffer Maria De-Arteaga Jonathan Elmer 95 0 0 05 Apr 2025
Talking About the Assumption in the Room Ramaravind Kommiya Mothilal Faisal M. Lalani Syed Ishtiaque Ahmed Shion Guha Sharifa Sultana 56 0 0 20 Feb 2025
Bridging the Communication Gap: Evaluating AI Labeling Practices for Trustworthy AI Development Raphael Fischer Magdalena Wischnewski Alexander van der Staay Katharina Poitz Christian Janiesch Thomas Liebig 45 0 0 21 Jan 2025
The Evolution of LLM Adoption in Industry Data Curation Practices Crystal Qian Michael Xieyang Liu Emily Reif Grady Simon Nada Hussein Nathan Clement James Wexler Carrie J. Cai Michael Terry Minsuk Kahng AILaw ELM 75 4 0 20 Dec 2024
Aligning Generalisation Between Humans and Machines Filip Ilievski Barbara Hammer F. V. Harmelen Benjamin Paassen S. Saralajew ... Vered Shwartz Gabriella Skitalinskaya Clemens Stachl Gido M. van de Ven T. Villmann 68 1 0 23 Nov 2024
BetterBench: Assessing AI Benchmarks, Uncovering Issues, and Establishing Best Practices Anka Reuel Amelia F. Hardy Chandler Smith Max Lamparth Malcolm Hardy Mykel J. Kochenderfer ELM 72 17 0 20 Nov 2024
Benchmark Data Repositories for Better Benchmarking Rachel Longjohn Markelle Kelly Sameer Singh Padhraic Smyth 41 0 0 31 Oct 2024
BenchmarkCards: Large Language Model and Risk Reporting Anna Sokol Nuno Moniz Elizabeth M. Daly Michael Hind Nitesh V. Chawla 31 0 0 16 Oct 2024
SELECT: A Large-Scale Benchmark of Data Curation Strategies for Image Classification Benjamin Feuer Jiawei Xu Niv Cohen Patrick Yubeaton Govind Mittal Chinmay Hegde 21 1 0 07 Oct 2024
From Transparency to Accountability and Back: A Discussion of Access and Evidence in AI Auditing Sarah H. Cen Rohan Alur 21 1 0 07 Oct 2024
Data-Centric AI Governance: Addressing the Limitations of Model-Focused Policies Ritwik Gupta Leah Walker Rodolfo Corona Stephanie Fu Suzanne Petryk Janet Napolitano Trevor Darrell Andrew W. Reddie ELM 35 3 0 25 Sep 2024
Improving governance outcomes through AI documentation: Bridging theory and practice Amy A. Winecoff Miranda Bogen 23 2 0 13 Sep 2024
Blockchain-Enabled Accountability in Data Supply Chain: A Data Bill of Materials Approach Yue Liu Dawen Zhang Boming Xia Julia Anticev Tunde Adebayo Zhenchang Xing Moses Machao 17 0 0 16 Aug 2024
The Future of Open Human Feedback Shachar Don-Yehiya Ben Burtenshaw Ramon Fernandez Astudillo Cailean Osborne Mimansa Jaiswal ... Omri Abend Jennifer Ding Sara Hooker Hannah Rose Kirk Leshem Choshen VLM ALM 62 4 0 15 Aug 2024
AI Research is not Magic, it has to be Reproducible and Responsible: Challenges in the AI field from the Perspective of its PhD Students Andrea Hrckova Jennifer Renoux Rafael Tolosana Calasanz Daniela Chuda Martin Tamajka Jakub Simko 13 0 0 13 Aug 2024
Explainable AI Reloaded: Challenging the XAI Status Quo in the Era of Large Language Models Upol Ehsan Mark O. Riedl 23 2 0 09 Aug 2024
Knowledge Prompting: How Knowledge Engineers Use Large Language Models Elisavet Koutsiana Johanna Walker Michelle Nwachukwu Albert Meroño-Peñuela Elena Simperl 40 1 0 02 Aug 2024
AccessShare: Co-designing Data Access and Sharing with Blind People Rie Kamikubo Farnaz Zamiri Zeraati Kyungjun Lee Hernisa Kacorri 40 1 0 27 Jul 2024
The Contribution of XAI for the Safe Development and Certification of AI: An Expert-Based Analysis Benjamin Frész Vincent Philipp Goebels Safa Omri Danilo Brajovic Andreas Aichele Janika Kutz Jens Neuhüttler Marco F. Huber 28 0 0 22 Jul 2024
Consent in Crisis: The Rapid Decline of the AI Data Commons Shayne Longpre Robert Mahari Ariel N. Lee Campbell Lund Hamidah Oderinwale ... Hanlin Li Daphne Ippolito Sara Hooker Jad Kabbara Sandy Pentland 69 35 0 20 Jul 2024
Data Guards: Challenges and Solutions for Fostering Trust in Data N. Sultanum Dennis Bromley Michael Correll 21 1 0 19 Jul 2024
Antibody DomainBed: Out-of-Distribution Generalization in Therapeutic Protein Design Natavsa Tagasovska Ji Won Park Matthieu Kirchmeyer Nathan C. Frey Andrew Watkins ... Arian R. Jamasb Edith Lee Tyler Bryson Stephen Ra Kyunghyun Cho OOD 34 6 0 15 Jul 2024
Position: Measure Dataset Diversity, Don't Just Claim It Dora Zhao Jerone T. A. Andrews Orestis Papakyriakopoulos Alice Xiang 64 14 0 11 Jul 2024
Documentation Practices of Artificial Intelligence Stefan Arnold Dilara Yesilbas Rene Gröbner Dominik Riedelbauch Maik Horn Sven Weinzierl AI4TS 26 0 0 26 Jun 2024
Laminator: Verifiable ML Property Cards using Hardware-assisted Attestations Vasisht Duddu Oskari Jarvinen Lachlan J. Gunn Nirmal Asokan 64 1 0 25 Jun 2024
Let Guidelines Guide You: A Prescriptive Guideline-Centered Data Annotation Methodology Federico Ruggeri Eleonora Misino Arianna Muti Katerina Korre Paolo Torroni Alberto Barrón-Cedeño 30 0 0 20 Jun 2024
Reproducibility in Machine Learning-based Research: Overview, Barriers and Drivers Harald Semmelrock Tony Ross-Hellauer Simone Kopeinik Dieter Theiler Armin Haberl Stefan Thalmann Dominik Kowald 65 6 0 20 Jun 2024
AgentDojo: A Dynamic Environment to Evaluate Attacks and Defenses for LLM Agents Edoardo Debenedetti Jie Zhang Mislav Balunović Luca Beurer-Kellner Marc Fischer Florian Tramèr LLMAG AAML 43 25 1 19 Jun 2024
Improving the Validity and Practical Usefulness of AI/ML Evaluations Using an Estimands Framework Olivier Binette Jerome P. Reiter 30 0 0 14 Jun 2024
Dataset and Lessons Learned from the 2024 SaTML LLM Capture-the-Flag Competition Edoardo Debenedetti Javier Rando Daniel Paleka Silaghi Fineas Florin Dragos Albastroiu ... Stefan Kraft Mario Fritz Florian Tramèr Sahar Abdelnabi Lea Schonherr 46 9 0 12 Jun 2024
A Taxonomy of Challenges to Curating Fair Datasets Dora Zhao M. Scheuerman Pooja Chitre Jerone T. A. Andrews Georgia Panagiotidou Shawn Walker Kathleen H. Pine Alice Xiang 39 2 0 10 Jun 2024
Reconfiguring Participatory Design to Resist AI Realism Aakash Gautam 35 3 0 05 Jun 2024
A Standardized Machine-readable Dataset Documentation Format for Responsible AI Nitisha Jain Mubashara Akhtar Joan Giner-Miguelez Rajat Shinde Joaquin Vanschoren ... Costanza Conforti Michael Kuchnik Lora Aroyo Omar Benjelloun Elena Simperl 21 2 0 04 Jun 2024
Towards Transparency: Exploring LLM Trainings Datasets through Visual Topic Modeling and Semantic Frame Charles de Dampierre Andrei Mogoutov Nicolas Baumard 42 1 0 03 Jun 2024
Learning Gaze-aware Compositional GAN Nerea Aranjuelo Siyu Huang Fundación Vicomtech Luis Unzueta Oihana Otaegui Hanspeter Pfister Donglai Wei GAN CVBM 26 0 0 31 May 2024
Automatic Generation of Model and Data Cards: A Step Towards Responsible AI Jiarui Liu Wenkai Li Zhijing Jin Mona T. Diab SyDa 55 3 0 10 May 2024
Benchmarking Benchmark Leakage in Large Language Models Ruijie Xu Zengzhi Wang Run-Ze Fan Pengfei Liu 56 42 0 29 Apr 2024
101 Billion Arabic Words Dataset Manel Aloui Hasna Chouikhi Ghaith Chaabane Haithem Kchaou Chehir Dhaouadi 36 1 0 29 Apr 2024
Modeling the Sacred: Considerations when Using Religious Texts in Natural Language Processing Ben Hutchinson 85 0 0 23 Apr 2024
From Model Performance to Claim: How a Change of Focus in Machine Learning Replicability Can Help Bridge the Responsibility Gap Tianqi Kou 32 0 0 19 Apr 2024
Data Authenticity, Consent, & Provenance for AI are all broken: what will it take to fix them? Shayne Longpre Robert Mahari Naana Obeng-Marnu William Brannon Tobin South Katy Gero Sandy Pentland Jad Kabbara 56 5 0 19 Apr 2024
Racial/Ethnic Categories in AI and Algorithmic Fairness: Why They Matter and What They Represent Jennifer Mickel 28 5 0 10 Apr 2024
JailbreakBench: An Open Robustness Benchmark for Jailbreaking Large Language Models Patrick Chao Edoardo Debenedetti Alexander Robey Maksym Andriushchenko Francesco Croce ... Nicolas Flammarion George J. Pappas F. Tramèr Hamed Hassani Eric Wong ALM ELM AAML 52 94 0 28 Mar 2024
Decoding the Digital Fine Print: Navigating the potholes in Terms of service/ use of GenAI tools against the emerging need for Transparent and Trustworthy Tech Futures Sundaraparipurnan Narayanan 26 0 0 26 Mar 2024
Dated Data: Tracing Knowledge Cutoffs in Large Language Models Jeffrey Cheng Marc Marone Orion Weller Dawn J Lawrie Daniel Khashabi Benjamin Van Durme 59 12 0 19 Mar 2024
From Fitting Participation to Forging Relationships: The Art of Participatory ML Ned Cooper Alex Zafiroglu 27 9 0 11 Mar 2024
Detectors for Safe and Reliable LLMs: Implementations, Uses, and Limitations Swapnaja Achintalwar Adriana Alvarado Garcia Ateret Anaby-Tavor Ioana Baldini Sara E. Berger ... Aashka Trivedi Kush R. Varshney Dennis L. Wei Shalisha Witherspooon Marcel Zalmanovici 25 10 0 09 Mar 2024
The Situate AI Guidebook: Co-Designing a Toolkit to Support Multi-Stakeholder Early-stage Deliberations Around Public Sector AI Proposals Anna Kawakami Amanda Coston Haiyi Zhu Hoda Heidari Kenneth Holstein 36 23 0 29 Feb 2024
Automatic Histograms: Leveraging Language Models for Text Dataset Exploration Emily Reif Crystal Qian James Wexler Minsuk Kahng 33 10 0 21 Feb 2024