v1v2v3v4v5v6v7v8 (latest)

Datasheets for Datasets

23 March 2018

Timnit Gebru

Jamie Morgenstern

Briana Vecchione

Jennifer Wortman Vaughan

Papers citing "Datasheets for Datasets"

50 / 1,068 papers shown

Title
ClevrSkills: Compositional Language and Visual Reasoning in RoboticsNeural Information Processing Systems (NeurIPS), 2024 Sanjay Haresh Daniel Dijkman Apratim Bhattacharyya Roland Memisevic CoGe LRM 205 7 0 13 Nov 2024
Beyond the Numbers: Transparency in Relation Extraction Benchmark Creation and Leaderboards Varvara Arzt Allan Hanbury 216 2 0 07 Nov 2024
ROAD-Waymo: Action Awareness at Scale for Autonomous Driving Salman Khan Izzeddin Teeti Reza Javanmard Alitappeh Mihaela C. Stoian Eleonora Giunchiglia Gurkirt Singh Andrew Bradley Fabio Cuzzolin 233 2 0 03 Nov 2024
A Systematic Review of NeurIPS Dataset Management PracticesNeural Information Processing Systems (NeurIPS), 2024 Yiwei Wu Leah Ajmani Shayne Longpre Hanlin Li 208 0 0 31 Oct 2024
Benchmark Data Repositories for Better BenchmarkingNeural Information Processing Systems (NeurIPS), 2024 Rachel Longjohn Markelle Kelly Sameer Singh Padhraic Smyth 231 10 0 31 Oct 2024
AllClear: A Comprehensive Dataset and Benchmark for Cloud Removal in Satellite ImageryNeural Information Processing Systems (NeurIPS), 2024 Hangyu Zhou Chia-Hsiang Kao Cheng Perng Phoo Utkarsh Mall Bharath Hariharan Kavita Bala 172 5 0 31 Oct 2024
GlotCC: An Open Broad-Coverage CommonCrawl Corpus and Pipeline for Minority LanguagesNeural Information Processing Systems (NeurIPS), 2024 Amir Hossein Kargaran François Yvon Hinrich Schutze VLM 254 11 0 31 Oct 2024
Driving by the Rules: A Benchmark for Integrating Traffic Sign Regulations into Vectorized HD MapComputer Vision and Pattern Recognition (CVPR), 2024 Xinyuan Chang Maixuan Xue Xinran Liu Zheng Pan Xing Wei 477 7 0 31 Oct 2024
OpenSatMap: A Fine-grained High-resolution Satellite Dataset for Large-scale Map ConstructionNeural Information Processing Systems (NeurIPS), 2024 Hongbo Zhao Lue Fan Yuntao Chen Haochen Wang Yiran Yang Xiaojuan Jin Yixin Zhang Gaofeng Meng Rundong Wang 213 9 0 30 Oct 2024
Public Domain 12M: A Highly Aesthetic Image-Text Dataset with Novel Governance Mechanisms Jordan Meyer Nick Padgett Cullen Miller Laura Exline 182 12 0 30 Oct 2024
EvoCodeBench: An Evolving Code Generation Benchmark with Domain-Specific EvaluationsNeural Information Processing Systems (NeurIPS), 2024 Jia Li Ge Li Xuanming Zhang Yunfei Zhao Yihong Dong Zhi Jin Binhua Li Fei Huang Yongbin Li ALM ELM 251 33 0 30 Oct 2024
Assessing the Auditability of AI-integrating Systems: A Framework and Learning Analytics Case Study Linda Fernsel Yannick Kalff Katharina Simbeck 154 4 0 29 Oct 2024
SceneGenAgent: Precise Industrial Scene Generation with Coding AgentAnnual Meeting of the Association for Computational Linguistics (ACL), 2024 Xiao Xia Dan Zhang Zibo Liao Zhenyu Hou Tianrui Sun Jing Li Ling Fu Yuxiao Dong AI4CE LM&Ro 3DV LLMAG 308 5 0 29 Oct 2024
Towards Human-centered Design of Explainable Artificial Intelligence (XAI): A Survey of Empirical Studies Shuai Ma 234 5 0 28 Oct 2024
NewTerm: Benchmarking Real-Time New Terms for Large Language Models with Annual UpdatesNeural Information Processing Systems (NeurIPS), 2024 Hexuan Deng Wenxiang Jiao Xuebo Liu Min Zhang Zhaopeng Tu 231 7 0 28 Oct 2024
Harmony4D: A Video Dataset for In-The-Wild Close Human InteractionsNeural Information Processing Systems (NeurIPS), 2024 Rawal Khirodkar Jyun-Ting Song Jinkun Cao Zhengyi Luo Kris Kitani 307 11 0 27 Oct 2024
Engineering Trustworthy AI: A Developer Guide for Empirical Risk MinimizationIEEE Transactions on Artificial Intelligence (IEEE TAI), 2024 Diana Pfau Alexander Jung 245 1 0 25 Oct 2024
Ethics Whitepaper: Whitepaper on Ethical Research into Large Language Models Eddie L. Ungless Nikolas Vitsakis Zeerak Talat James Garforth Bjorn Ross Arno Onken Atoosa Kasirzadeh Alexandra Birch 242 3 0 17 Oct 2024
Data Defenses Against Large Language Models William Agnew Harry H. Jiang Cella Sum Maarten Sap Sauvik Das AAML 268 0 0 17 Oct 2024
Sound Check: Auditing Audio Datasets William Agnew Julia Barnett Annie Chu Rachel Hong Michael Feffer Robin Netzorg Harry H. Jiang Ezra Awumey Sauvik Das 326 2 0 17 Oct 2024
Building Better: Avoiding Pitfalls in Developing Language Resources when Data is ScarceAnnual Meeting of the Association for Computational Linguistics (ACL), 2024 N. Ousidhoum Meriem Beloucif Saif M. Mohammad 528 1 0 16 Oct 2024
BenchmarkCards: Standardized Documentation for Large Language Model Benchmarks Anna Sokol Elizabeth M. Daly Michael Hind David Piorkowski Xiangliang Zhang Nuno Moniz Nitesh Chawla 279 0 0 16 Oct 2024
To Err is AI : A Case Study Informing LLM Flaw Reporting PracticesAAAI Conference on Artificial Intelligence (AAAI), 2024 Sean McGregor Allyson Ettinger Nick Judd Paul Albee Liwei Jiang ... Avijit Ghosh Christopher Fiorelli Michelle Hoang Sven Cattell Nouha Dziri 148 5 0 15 Oct 2024
Visual-Geometric Collaborative Guidance for Affordance Learning Hongchen Luo Wei-dong Zhai Jiashuo Wang Yang Cao Zheng-jun Zha 247 1 0 15 Oct 2024
CoPESD: A Multi-Level Surgical Motion Dataset for Training Large Vision-Language Models to Co-Pilot Endoscopic Submucosal Dissection Guankun Wang Han Xiao Huxin Gao Renrui Zhang Long Bai Xiaoxiao Yang Zhen Li Hongsheng Li Hongliang Ren 191 9 0 10 Oct 2024
Detecting Training Data of Large Language Models via Expectation Maximization Gyuwan Kim Yang Li Evangelia Spiliopoulou Jie Ma Miguel Ballesteros William Yang Wang MIALM 622 9 2 10 Oct 2024
Data Publishing in Mechanics and Dynamics: Challenges, Guidelines, and Examples from Engineering DesignData-Centric Engineering (DCE), 2024 Henrik Ebel J. V. Delden Timo Luddecke Aditya Borse Rutwik Gulakala ... Kristin Miriam de Payrebrune Maximilian Raff C. D. Remy Benedict Röder P. Eberhard AI4CE 178 1 0 07 Oct 2024
From Transparency to Accountability and Back: A Discussion of Access and Evidence in AI AuditingConference on Equity and Access in Algorithms, Mechanisms, and Optimization (EAAMO), 2024 Sarah H. Cen Rohan Alur 221 8 0 07 Oct 2024
On the Reliability of Large Language Models to Misinformed and Demographically-Informed PromptsThe AI Magazine (AI Mag.), 2024 Toluwani Aremu Oluwakemi Akinwehinmi Chukwuemeka Nwagu Syed Ishtiaque Ahmed Rita Orji Pedro Arnau Del Amo Abdulmotaleb El Saddik 275 6 0 06 Oct 2024
AirLetters: An Open Video Dataset of Characters Drawn in the Air Rishit Dagli Guillaume Berger Joanna Materzynska Ingo Bax Roland Memisevic VGen 164 1 0 03 Oct 2024
Mitigating Downstream Model Risks via Model Provenance Keyu Wang Abdullah Norozi Iranzad Scott Schaffter Doina Precup Jonathan Lebensold 222 1 0 03 Oct 2024
SteerDiff: Steering towards Safe Text-to-Image Diffusion Models Hongxiang Zhang Yifeng He Hao Chen 252 7 0 03 Oct 2024
Uncertainty Modelling and Robust Observer Synthesis using the Koopman Operator Steven Dahdah James R. Forbes 200 3 0 01 Oct 2024
CableInspect-AD: An Expert-Annotated Anomaly Detection DatasetNeural Information Processing Systems (NeurIPS), 2024 Akshatha Arodi Margaux Luck Jean-Luc Bedwani Aldo Zaimi Ge Li Nicolas Pouliot Julien Beaudry Gaétan Marceau Caron 128 3 0 30 Sep 2024
A Critical Look at Meta-evaluating Summarisation Evaluation MetricsConference on Empirical Methods in Natural Language Processing (EMNLP), 2024 Xiang Dai Sarvnaz Karimi Biaoyan Fang 229 1 0 29 Sep 2024
Responsible AI in Open Ecosystems: Reconciling Innovation with Risk Assessment and Disclosure Mahasweta Chakraborti Bert Joseph Prestoza Nicholas Vincent Seth Frey 227 1 0 27 Sep 2024
Easy2Hard-Bench: Standardized Difficulty Labels for Profiling LLM Performance and GeneralizationNeural Information Processing Systems (NeurIPS), 2024 Mucong Ding Chenghao Deng Jocelyn Choo Zichu Wu Aakriti Agrawal ... Wanrong Zhu Tom Goldstein John Langford Anima Anandkumar Furong Huang 297 7 0 27 Sep 2024
Wildlife Product Trading in Online Social Networks: A Case Study on Ivory-Related Product Sales Promotion PostsInternational Conference on Web and Social Media (ICWSM), 2024 Guanyi Mou Yun Yue Kyumin Lee Ziming Zhang OnRL 78 0 0 25 Sep 2024
Creative Writers' Attitudes on Writing as Training Data for Large Language ModelsInternational Conference on Human Factors in Computing Systems (CHI), 2024 Katy Ilonka Gero Meera Desai Carly Schnitzler Nayun Eom Jack Cushman Elena L. Glassman 204 7 0 22 Sep 2024
ALPEC: A Comprehensive Evaluation Framework and Dataset for Machine Learning-Based Arousal Detection in Clinical Practice Stefan Kraft Andreas Theissler Vera Wienhausen-Wilke Philipp Walter Gjergji Kasneci 239 0 0 20 Sep 2024
A quest through interconnected datasets: lessons from highly-cited ICASSP papersInternational Conference on Content-Based Multimedia Indexing (CBMI), 2024 Cynthia C. S. Liem Doğa Taşcılar Andrew M. Demetriou 144 0 0 19 Sep 2024
SHIRE: Enhancing Sample Efficiency using Human Intuition in REinforcement LearningIEEE International Conference on Robotics and Automation (ICRA), 2024 Amogh Joshi Adarsh Kosta Kaushik Roy OffRL 343 4 0 16 Sep 2024
Keeping Humans in the Loop: Human-Centered Automated Annotation with Generative AIInternational Conference on Web and Social Media (ICWSM), 2024 Nicholas Pangakis Samuel Wolken 275 13 0 14 Sep 2024
Improving governance outcomes through AI documentation: Bridging theory and practiceInternational Conference on Human Factors in Computing Systems (CHI), 2024 Amy A. Winecoff Miranda Bogen 214 7 0 13 Sep 2024
ManaTTS Persian: a recipe for creating TTS datasets for lower resource languagesNorth American Chapter of the Association for Computational Linguistics (NAACL), 2024 Mahta Fetrat Qharabagh Zahra Dehghanian Hamid R. Rabiee 187 6 0 11 Sep 2024
Ethical Challenges in Computer Vision: Ensuring Privacy and Mitigating Bias in Publicly Available Datasets Ghalib Ahmed Tahir 239 2 0 31 Aug 2024
LLMs generate structurally realistic social networks but overestimate political homophilyInternational Conference on Web and Social Media (ICWSM), 2024 Serina Chang Alicja Chaszczewicz Emma Wang Maya Josifovska Emma Pierson J. Leskovec 316 21 0 29 Aug 2024
Complexity as Design Material Florian Windhager Alfie Abduhl-Rahman Mark-Jan Bludau Nicole Hengesbach Houda Lamqaddam Isabel Meirelles Bettina Speckmann Michael Correll 145 3 0 27 Aug 2024
Relationships are Complicated! An Analysis of Relationships Between Datasets on the WebInternational Workshop on the Semantic Web (SW), 2024 Kate Lin Tarfah Alrashed Natasha Noy 117 2 0 26 Aug 2024
Do Responsible AI Artifacts Advance Stakeholder Goals? Four Key Barriers Perceived by Legal and Civil StakeholdersAAAI/ACM Conference on AI, Ethics, and Society (AIES), 2024 Anna Kawakami Daricia Wilkinson Alexandra Chouldechova 137 6 0 22 Aug 2024