Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
1803.09010
Cited By
v1
v2
v3
v4
v5
v6
v7
v8 (latest)
Datasheets for Datasets
23 March 2018
Timnit Gebru
Jamie Morgenstern
Briana Vecchione
Jennifer Wortman Vaughan
Hanna M. Wallach
Hal Daumé
Kate Crawford
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Datasheets for Datasets"
50 / 1,069 papers shown
Benchmarking Multimodal AutoML for Tabular Data with Text Fields
Xingjian Shi
Jonas W. Mueller
Nick Erickson
Mu Li
Alexander J. Smola
LMTD
152
39
0
04 Nov 2021
Feature and Label Embedding Spaces Matter in Addressing Image Classifier Bias
William Thong
Cees G. M. Snoek
148
16
0
27 Oct 2021
IconQA: A New Benchmark for Abstract Diagram Understanding and Visual Language Reasoning
Pan Lu
Liang Qiu
Jiaqi Chen
Tony Xia
Yizhou Zhao
Wei Zhang
Zhou Yu
Xiaodan Liang
Song-Chun Zhu
AIMat
388
262
0
25 Oct 2021
What Would Jiminy Cricket Do? Towards Agents That Behave Morally
Dan Hendrycks
Mantas Mazeika
Andy Zou
Sahil Patel
Christine Zhu
Jesus Navarro
Basel Alomair
Yue Liu
Jacob Steinhardt
242
72
0
25 Oct 2021
Human-Centered Explainable AI (XAI): From Algorithms to User Experiences
Q. V. Liao
R. Varshney
549
283
0
20 Oct 2021
Does Data Repair Lead to Fair Models? Curating Contextually Fair Data To Reduce Model Bias
IEEE Workshop/Winter Conference on Applications of Computer Vision (WACV), 2021
Sharat Agarwal
Sumanyu Muku
Saket Anand
Chetan Arora
161
15
0
20 Oct 2021
A Framework for Deprecating Datasets: Standardizing Documentation, Identification, and Communication
A. Luccioni
Frances Corry
H. Sridharan
Mike Ananny
J. Schultz
Kate Crawford
348
34
0
18 Oct 2021
Small Data and Process in Data Visualization: The Radical Translations Case Study
Arianna Ciula
Miguel Vieira
Ginestra Ferraro
Tiffany Ong
S. Perovic
Rosa Mucignat
Niccolò Valmori
Brecht Deseure
E. Mannucci
44
1
0
18 Oct 2021
RL4RS: A Real-World Dataset for Reinforcement Learning based Recommender System
Kai Wang
Zhene Zou
Minghao Zhao
Qilin Deng
Yue Shang
Yile Liang
Runze Wu
Xudong Shen
Tangjie Lyu
Changjie Fan
OffRL
317
12
0
18 Oct 2021
BEAMetrics: A Benchmark for Language Generation Evaluation Evaluation
Thomas Scialom
Felix Hill
157
7
0
18 Oct 2021
HumBugDB: A Large-scale Acoustic Mosquito Dataset
Ivan Kiskin
Marianne E. Sinka
Adam D. Cobb
Waqas Rafique
Lawrence Wang
...
E. Kaindoa
G. Killeen
Eva Herreros-Moya
Katherine J. Willis
Stephen J. Roberts
173
37
0
14 Oct 2021
Masader: Metadata Sourcing for Arabic Text and Speech Data Resources
Zaid Alyafeai
Maraim Masoud
Mustafa Ghaleb
Maged S. Al-Shaibani
334
29
0
13 Oct 2021
On Releasing Annotator-Level Labels and Information in Datasets
Law (LAW), 2021
Vinodkumar Prabhakaran
Aida Mostafazadeh Davani
Mark Díaz
252
170
0
12 Oct 2021
We Need to Talk About Data: The Importance of Data Readiness in Natural Language Processing
Fredrik Olsson
Magnus Sahlgren
125
1
0
11 Oct 2021
Chaos as an interpretable benchmark for forecasting and data-driven modelling
W. Gilpin
AI4TS
293
106
0
11 Oct 2021
Exploring constraints on CycleGAN-based CBCT enhancement for adaptive radiotherapy
Suraj Pai
MedIm
98
0
0
09 Oct 2021
Inferring Offensiveness In Images From Natural Language Supervision
P. Schramowski
Kristian Kersting
90
2
0
08 Oct 2021
CLEVA-Compass: A Continual Learning EValuation Assessment Compass to Promote Research Transparency and Comparability
Martin Mundt
Steven Braun
Quentin Delfosse
Kristian Kersting
243
38
0
07 Oct 2021
Trustworthy AI: From Principles to Practices
Yue Liu
Peng Qi
Bo Liu
Shuai Di
Jingen Liu
Jiquan Pei
Jinfeng Yi
Bowen Zhou
473
520
0
04 Oct 2021
The VVAD-LRS3 Dataset for Visual Voice Activity Detection
Adrian Lubitz
Matias Valdenegro-Toro
Frank Kirchner
161
4
0
28 Sep 2021
Auditing AI models for Verified Deployment under Semantic Specifications
Homanga Bharadhwaj
De-An Huang
Chaowei Xiao
Anima Anandkumar
Animesh Garg
MLAU
188
6
0
25 Sep 2021
SoK: Machine Learning Governance
Varun Chandrasekaran
Hengrui Jia
Anvith Thudi
Adelin Travers
Mohammad Yaghini
Nicolas Papernot
271
20
0
20 Sep 2021
FUTURE-AI: Guiding Principles and Consensus Recommendations for Trustworthy Artificial Intelligence in Medical Imaging
Karim Lekadira
Richard Osuala
C. Gallin
Noussair Lazrak
Kaisar Kushibar
...
Nickolas Papanikolaou
Zohaib Salahuddin
Henry C. Woodruff
Philippe Lambin
L. Martí-Bonmatí
AI4TS
359
79
0
20 Sep 2021
Studying Up Machine Learning Data: Why Talk About Bias When We Mean Power?
Milagros Miceli
Julian Posada
Tianling Yang
116
74
0
16 Sep 2021
Data Hunches: Incorporating Personal Knowledge into Visualizations
Haihan Lin
Derya Akbaba
Miriah D. Meyer
A. Lex
173
42
0
15 Sep 2021
HPOBench: A Collection of Reproducible Multi-Fidelity Benchmark Problems for HPO
Katharina Eggensperger
Philip Muller
Neeratyoy Mallik
Matthias Feurer
René Sass
Aaron Klein
Noor H. Awad
Marius Lindauer
Katharina Eggensperger
437
122
0
14 Sep 2021
Generating Datasets of 3D Garments with Sewing Patterns
Maria Korosteleva
Sung-Hee Lee
185
46
0
12 Sep 2021
Making Online Communities 'Better': A Taxonomy of Community Values on Reddit
Galen Cassebeer Weld
Amy X. Zhang
Tim Althoff
299
45
0
11 Sep 2021
Toward a Perspectivist Turn in Ground Truthing for Predictive Computing
AAAI Conference on Artificial Intelligence (AAAI), 2021
Valerio Basile
F. Cabitza
Andrea Campagner
Michael Fell
264
207
0
09 Sep 2021
Datasets: A Community Library for Natural Language Processing
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2021
Quentin Lhoest
Albert Villanova del Moral
Yacine Jernite
A. Thakur
Patrick von Platen
...
Thibault Goehringer
Victor Mustar
François Lagunas
Alexander M. Rush
Thomas Wolf
584
705
0
07 Sep 2021
MultiEURLEX -- A multi-lingual and multi-label legal document classification dataset for zero-shot cross-lingual transfer
Ilias Chalkidis
Manos Fergadiotis
Ion Androutsopoulos
AILaw
392
133
0
02 Sep 2021
Making the Invisible Visible: Risks and Benefits of Disclosing Metadata in Visualization
Alyxander Burns
Thai On
C. Lee
R. Shapiro
Cindy Xiong
Narges Mahyar
144
10
0
30 Aug 2021
SHIFT15M: Fashion-specific dataset for set-to-set matching with several distribution shifts
Masanari Kimura
Takuma Nakamura
Yuki Saito
OOD
217
3
0
30 Aug 2021
A comparison of approaches to improve worst-case predictive model performance over patient subpopulations
Scientific Reports (Sci Rep), 2021
Stephen Pfohl
Haoran Zhang
Yizhe Xu
Agata Foryciarz
Marzyeh Ghassemi
N. Shah
OOD
289
24
0
27 Aug 2021
Sharing Practices for Datasets Related to Accessibility and Aging
International ACM SIGACCESS Conference on Computers and Accessibility (ASSETS), 2021
Rie Kamikubo
Utkarsh Dwivedi
Hernisa Kacorri
150
15
0
24 Aug 2021
A Framework for Understanding AI-Induced Field Change: How AI Technologies are Legitimized and Institutionalized
B. Larsen
86
7
0
18 Aug 2021
Reusable Templates and Guides For Documenting Datasets and Models for Natural Language Processing and Generation: A Case Study of the HuggingFace and GEM Data and Model Cards
Angelina McMillan-Major
Salomey Osei
Juan Diego Rodriguez
Pawan Sasanka Ammanamanchi
Sebastian Gehrmann
Yacine Jernite
164
54
0
16 Aug 2021
Presenting an extensive lab- and field-image dataset of crops and weeds for computer vision tasks in agriculture
Michael A. Beck
Chen-Yi Liu
C. Bidinosti
C. Henry
Cara M. Godee
Manisha Ajmani
3DV
VLM
97
5
0
12 Aug 2021
Retiring Adult: New Datasets for Fair Machine Learning
Neural Information Processing Systems (NeurIPS), 2021
Frances Ding
Moritz Hardt
John Miller
Ludwig Schmidt
451
544
0
10 Aug 2021
Do Datasets Have Politics? Disciplinary Values in Computer Vision Dataset Development
M. Scheuerman
Emily L. Denton
A. Hanna
228
242
0
09 Aug 2021
On Measures of Biases and Harms in NLP
Sunipa Dev
Emily Sheng
Jieyu Zhao
Aubrie Amstutz
Jiao Sun
...
M. Sanseverino
Jiin Kim
Akihiro Nishi
Nanyun Peng
Kai-Wei Chang
244
108
0
07 Aug 2021
Mitigating Dataset Harms Requires Stewardship: Lessons from 1000 Papers
Kenny Peng
Arunesh Mathur
Arvind Narayanan
349
106
0
06 Aug 2021
An Ethical Framework for Guiding the Development of Affectively-Aware Artificial Intelligence
Affective Computing and Intelligent Interaction (ACII), 2021
Desmond C. Ong
58
35
0
29 Jul 2021
On the state of reporting in crowdsourcing experiments and a checklist to aid current practices
Jorge M. Ramírez
Burcu Sayin
Marcos Báez
Fabio Casati
L. Cernuzzi
B. Benatallah
Gianluca Demartini
206
24
0
28 Jul 2021
QA Dataset Explosion: A Taxonomy of NLP Resources for Question Answering and Reading Comprehension
ACM Computing Surveys (CSUR), 2021
Anna Rogers
Matt Gardner
Isabelle Augenstein
375
191
0
27 Jul 2021
Responsible and Regulatory Conform Machine Learning for Medicine: A Survey of Challenges and Solutions
IEEE Access (IEEE Access), 2021
Eike Petersen
Yannik Potdevin
Esfandiar Mohammadi
Stephan Zidowitz
Sabrina Breyer
...
Sandra Henn
Ludwig Pechmann
M. Leucker
P. Rostalski
Christian Herzog
FaML
AILaw
OOD
275
35
0
20 Jul 2021
MultiBench: Multiscale Benchmarks for Multimodal Representation Learning
Paul Pu Liang
Yiwei Lyu
Xiang Fan
Zetian Wu
Yun Cheng
...
Peter Wu
Michelle A. Lee
Yuke Zhu
Ruslan Salakhutdinov
Louis-Philippe Morency
VLM
276
223
0
15 Jul 2021
Shifts: A Dataset of Real Distributional Shift Across Multiple Large-Scale Tasks
A. Malinin
Neil Band
Ganshin
Alexander
German Chesnokov
...
Roginskiy
Denis
Mariya Shmatova
Panos Tigas
Boris Yangel
UQCV
OOD
483
147
0
15 Jul 2021
Deduplicating Training Data Makes Language Models Better
Katherine Lee
Daphne Ippolito
A. Nystrom
Chiyuan Zhang
Douglas Eck
Chris Callison-Burch
Nicholas Carlini
SyDa
717
770
0
14 Jul 2021
"Garbage In, Garbage Out" Revisited: What Do Machine Learning Application Papers Report About Human-Labeled Training Data?
R. Geiger
Dominique Cope
Jamie Ip
Marsha Lotosh
Aayush Shah
Jenny Weng
Rebekah Tang
149
71
0
05 Jul 2021
Previous
1
2
3
...
17
18
19
20
21
22
Next