Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2407.14933
Cited By
Consent in Crisis: The Rapid Decline of the AI Data Commons
20 July 2024
Shayne Longpre
Robert Mahari
Ariel N. Lee
Campbell Lund
Hamidah Oderinwale
William Brannon
Nayan Saxena
Naana Obeng-Marnu
Tobin South
Cole J. Hunter
Kevin Klyman
Christopher Klamm
Hailey Schoelkopf
Nikhil Singh
Manuel Cherep
Ahmad Anis
An Dinh
Caroline Chitongo
Da Yin
Damien Sileo
Deividas Mataciunas
Diganta Misra
Emad A. Alghamdi
Enrico Shippole
Jianguo Zhang
Joanna Materzynska
Kun Qian
Kush Tiwary
Lester James Validad Miranda
Manan Dey
Minnie Liang
Mohammed Hamdy
Niklas Muennighoff
Seonghyeon Ye
Seungone Kim
Shrestha Mohanty
Vipul Gupta
Vivek Sharma
Vu Minh Chien
Xuhui Zhou
Yizhi Li
Caiming Xiong
Luis Villa
Stella Biderman
Hanlin Li
Daphne Ippolito
Sara Hooker
Jad Kabbara
Sandy Pentland
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Consent in Crisis: The Rapid Decline of the AI Data Commons"
15 / 15 papers shown
Title
Beyond Public Access in LLM Pre-Training Data
Sruly Rosenblat
Tim O'Reilly
Ilan Strauss
MLAU
53
0
0
24 Apr 2025
Beyond Release: Access Considerations for Generative AI Systems
Irene Solaiman
Rishi Bommasani
Dan Hendrycks
Ariel Herbert-Voss
Yacine Jernite
Aviya Skowron
Andrew Trask
58
1
0
23 Feb 2025
SoK: Decentralized AI (DeAI)
Zhipeng Wang
Rui Sun
Elizabeth Lui
Vatsal Shah
Xihan Xiong
Jiahao Sun
Davide Crapis
William Knottenbelt
94
1
0
26 Nov 2024
Improving Model Evaluation using SMART Filtering of Benchmark Datasets
Vipul Gupta
Candace Ross
David Pantoja
R. Passonneau
Megan Ung
Adina Williams
43
1
0
26 Oct 2024
DAWN-ICL: Strategic Planning of Problem-solving Trajectories for Zero-Shot In-Context Learning
Xinyu Tang
Xiaolei Wang
Wayne Xin Zhao
Ji-Rong Wen
38
3
0
26 Oct 2024
YODAS: Youtube-Oriented Dataset for Audio and Speech
Xinjian Li
Shinnosuke Takamichi
Takaaki Saeki
William Chen
Sayaka Shiota
Shinji Watanabe
38
16
0
02 Jun 2024
WildChat: 1M ChatGPT Interaction Logs in the Wild
Wenting Zhao
Xiang Ren
Jack Hessel
Claire Cardie
Yejin Choi
Yuntian Deng
40
171
0
02 May 2024
OLMo: Accelerating the Science of Language Models
Dirk Groeneveld
Iz Beltagy
Pete Walsh
Akshita Bhagia
Rodney Michael Kinney
...
Jesse Dodge
Kyle Lo
Luca Soldaini
Noah A. Smith
Hanna Hajishirzi
OSLM
127
349
0
01 Feb 2024
Sparks of Artificial General Intelligence: Early experiments with GPT-4
Sébastien Bubeck
Varun Chandrasekaran
Ronen Eldan
J. Gehrke
Eric Horvitz
...
Scott M. Lundberg
Harsha Nori
Hamid Palangi
Marco Tulio Ribeiro
Yi Zhang
ELM
AI4MH
AI4CE
ALM
203
2,232
0
22 Mar 2023
What Language Model to Train if You Have One Million GPU Hours?
Teven Le Scao
Thomas Wang
Daniel Hesslow
Lucile Saulnier
Stas Bekman
...
Lintang Sutawika
Jaesung Tae
Zheng-Xin Yong
Julien Launay
Iz Beltagy
MoE
AI4CE
212
103
0
27 Oct 2022
Training language models to follow instructions with human feedback
Long Ouyang
Jeff Wu
Xu Jiang
Diogo Almeida
Carroll L. Wainwright
...
Amanda Askell
Peter Welinder
Paul Christiano
Jan Leike
Ryan J. Lowe
OSLM
ALM
301
11,730
0
04 Mar 2022
Multitask Prompted Training Enables Zero-Shot Task Generalization
Victor Sanh
Albert Webson
Colin Raffel
Stephen H. Bach
Lintang Sutawika
...
T. Bers
Stella Biderman
Leo Gao
Thomas Wolf
Alexander M. Rush
LRM
203
1,651
0
15 Oct 2021
Challenges in Detoxifying Language Models
Johannes Welbl
Amelia Glaese
J. Uesato
Sumanth Dathathri
John F. J. Mellor
Lisa Anne Hendricks
Kirsty Anderson
Pushmeet Kohli
Ben Coppin
Po-Sen Huang
LM&MA
242
191
0
15 Sep 2021
Memorization vs. Generalization: Quantifying Data Leakage in NLP Performance Evaluation
Aparna Elangovan
Jiayuan He
Karin Verspoor
TDI
FedML
156
89
0
03 Feb 2021
The Pile: An 800GB Dataset of Diverse Text for Language Modeling
Leo Gao
Stella Biderman
Sid Black
Laurence Golding
Travis Hoppe
...
Horace He
Anish Thite
Noa Nabeshima
Shawn Presser
Connor Leahy
AIMat
242
1,977
0
31 Dec 2020
1