Data Authenticity, Consent, & Provenance for AI are all broken: what
will it take to fix them?

Data Authenticity, Consent, & Provenance for AI are all broken: what will it take to fix them?

19 April 2024

Naana Obeng-Marnu

William Brannon

Papers citing "Data Authenticity, Consent, & Provenance for AI are all broken: what will it take to fix them?"

9 / 9 papers shown

Title
Consent in Crisis: The Rapid Decline of the AI Data Commons Shayne Longpre Robert Mahari Ariel N. Lee Campbell Lund Hamidah Oderinwale ... Hanlin Li Daphne Ippolito Sara Hooker Jad Kabbara Sandy Pentland 34 34 0 20 Jul 2024
Building an Ethical and Trustworthy Biomedical AI Ecosystem for the Translational and Clinical Integration of Foundational Models Simha Sankar Baradwaj Destiny Gilliland Jack Rincon Henning Hermjakob Yu Yan ... Dean Wang Karol Watson Alex Bui Wei Wang Peipei Ping 29 5 0 18 Jul 2024
Watermarks in the Sand: Impossibility of Strong Watermarking for Generative Models Hanlin Zhang Benjamin L. Edelman Danilo Francati Daniele Venturi G. Ateniese Boaz Barak WaLM 132 53 0 07 Nov 2023
Market Concentration Implications of Foundation Models Jai Vipra Anton Korinek ELM 16 14 0 02 Nov 2023
Pile of Law: Learning Responsible Data Filtering from the Law and a 256GB Open-Source Legal Dataset Peter Henderson M. Krass Lucia Zheng Neel Guha Christopher D. Manning Dan Jurafsky Daniel E. Ho AILaw ELM 127 94 0 01 Jul 2022
Just What do You Think You're Doing, Dave?' A Checklist for Responsible Data Use in NLP Anna Rogers Timothy Baldwin Kobi Leins 102 64 0 14 Sep 2021
The Pile: An 800GB Dataset of Diverse Text for Language Modeling Leo Gao Stella Biderman Sid Black Laurence Golding Travis Hoppe ... Horace He Anish Thite Noa Nabeshima Shawn Presser Connor Leahy AIMat 236 1,508 0 31 Dec 2020
Extracting Training Data from Large Language Models Nicholas Carlini Florian Tramèr Eric Wallace Matthew Jagielski Ariel Herbert-Voss ... Tom B. Brown D. Song Ulfar Erlingsson Alina Oprea Colin Raffel MLAU SILM 264 1,798 0 14 Dec 2020
Scaling Laws for Neural Language Models Jared Kaplan Sam McCandlish T. Henighan Tom B. Brown B. Chess R. Child Scott Gray Alec Radford Jeff Wu Dario Amodei 220 3,054 0 23 Jan 2020