Reduced, Reused and Recycled: The Life of a Dataset in Machine Learning Research

3 December 2021

Papers citing "Reduced, Reused and Recycled: The Life of a Dataset in Machine Learning Research"

50 / 78 papers shown

Title
Minimizing Risk Through Minimizing Model-Data Interaction: A Protocol For Relying on Proxy Tasks When Designing Child Sexual Abuse Imagery Detection Models Thamiris Coelho Leo S. F. Ribeiro João Macedo J. A. dos Santos Sandra Avila 16 0 0 10 May 2025
We Need Improved Data Curation and Attribution in AI for Scientific Discovery Mara Graziani Antonio Foncubierta Dimitrios Christofidellis Irina Espejo Morales Malina Molnar Marvin Alberts Matteo Manica Jannis Born 43 0 0 03 Apr 2025
What do Large Language Models Say About Animals? Investigating Risks of Animal Harm in Generated Text Arturs Kanepajs Aditi Basu Sankalpa Ghose Constance Li Akshat Mehta Ronak Mehta Samuel David Tucker-Davis Eric Zhou Bob Fischer ALM ELM 43 0 0 03 Mar 2025
AnnoCaseLaw: A Richly-Annotated Dataset For Benchmarking Explainable Legal Judgment Prediction Magnus Sesodia Alina Petrova John Armour Thomas Lukasiewicz Oana-Maria Camburu P. Dokania Philip H. S. Torr Christian Schroeder de Witt AILaw ELM 41 1 0 28 Feb 2025
Can We Trust AI Benchmarks? An Interdisciplinary Review of Current Issues in AI Evaluation Maria Eriksson Erasmo Purificato Arman Noroozian Joao Vinagre Guillaume Chaslot Emilia Gomez David Fernandez Llorca ELM 128 1 0 10 Feb 2025
Pricing and Competition for Generative AI Rafid Mahmood 22 3 0 04 Nov 2024
A Systematic Review of NeurIPS Dataset Management Practices Yiwei Wu Leah Ajmani Shayne Longpre Hanlin Li 39 0 0 31 Oct 2024
Benchmark Data Repositories for Better Benchmarking Rachel Longjohn Markelle Kelly Sameer Singh Padhraic Smyth 38 0 0 31 Oct 2024
Scito2M: A 2 Million, 30-Year Cross-disciplinary Dataset for Temporal Scientometric Analysis Yiqiao Jin Yijia Xiao Yiyang Wang Jindong Wang 28 0 0 12 Oct 2024
Enhancing Data Quality through Simple De-duplication: Navigating Responsible Computational Social Science Research Yida Mu Mali Jin Xingyi Song Nikolaos Aletras 18 0 0 04 Oct 2024
Transforming Scholarly Landscapes: Influence of Large Language Models on Academic Fields beyond Computer Science Aniket Pramanick Yufang Hou Saif M. Mohammad Iryna Gurevych 31 1 0 29 Sep 2024
Building Better Datasets: Seven Recommendations for Responsible Design from Dataset Creators Will Orr Kate Crawford 30 3 0 30 Aug 2024
Benchmarks as Microscopes: A Call for Model Metrology Michael Stephen Saxon Ari Holtzman Peter West William Yang Wang Naomi Saphra 29 10 0 22 Jul 2024
A Taxonomy of Challenges to Curating Fair Datasets Dora Zhao M. Scheuerman Pooja Chitre Jerone T. A. Andrews Georgia Panagiotidou Shawn Walker Kathleen H. Pine Alice Xiang 39 2 0 10 Jun 2024
Oil & Water? Diffusion of AI Within and Across Scientific Fields Eamon Duede William Dolan André Bauer Ian T. Foster Karim Lakhani AI4CE 21 4 0 24 May 2024
Adaptive Data Analysis for Growing Data Neil G. Marchant Benjamin I. P. Rubinstein 30 0 0 22 May 2024
Position: Why We Must Rethink Empirical Research in Machine Learning Moritz Herrmann F. J. D. Lange Katharina Eggensperger Giuseppe Casalicchio Marcel Wever Matthias Feurer David Rügamer Eyke Hüllermeier A. Boulesteix Bernd Bischl 44 6 0 03 May 2024
Inherent Trade-Offs between Diversity and Stability in Multi-Task Benchmarks Guanhua Zhang Moritz Hardt 42 7 0 02 May 2024
AI Competitions and Benchmarks: Dataset Development Romain Egele Julio C. S. Jacques Junior Jan N. van Rijn Isabelle M Guyon Xavier Baró Albert Clapés Prasanna Balaprakash Sergio Escalera T. Moeslund Jun Wan 42 0 0 15 Apr 2024
From Protoscience to Epistemic Monoculture: How Benchmarking Set the Stage for the Deep Learning Revolution Bernard J. Koch David Peterson 14 5 0 09 Apr 2024
A Decade's Battle on Dataset Bias: Are We There Yet? Zhuang Liu Kaiming He 37 28 0 13 Mar 2024
Better than classical? The subtle art of benchmarking quantum machine learning models Joseph Bowles Shahnawaz Ahmed Maria Schuld 34 62 0 11 Mar 2024
Speech Translation with Speech Foundation Models and Large Language Models: What is There and What is Missing? Marco Gaido Sara Papi Matteo Negri L. Bentivogli 41 12 0 19 Feb 2024
Copycats: the many lives of a publicly available medical imaging dataset Amelia Jiménez-Sánchez Natalia-Rozalia Avlona Dovile Juodelyte Théo Sourget Caroline Vang-Larsen Anna Rogers Hubert Dariusz Zajkac V. Cheplygina 27 0 0 09 Feb 2024
[Citation needed] Data usage and citation practices in medical imaging conferences Théo Sourget Ahmet Akkocc Stinna Winther Christine Lyngbye Galsgaard Amelia Jiménez-Sánchez Dovile Juodelyte Caroline Petitjean V. Cheplygina 14 2 0 05 Feb 2024
Navigating Dataset Documentations in AI: A Large-Scale Analysis of Dataset Cards on Hugging Face Xinyu Yang Weixin Liang James Y. Zou CVBM 18 16 0 24 Jan 2024
Challenge design roadmap Hugo Jair Escalante Isabelle M Guyon Addison Howard Walter Reade Sébastien Treguer AI4TS 13 0 0 15 Jan 2024
From Knowledge Representation to Knowledge Organization and Back Fausto Giunchiglia Mayukh Bagchi 8 3 0 12 Dec 2023
Socially Cognizant Robotics for a Technology Enhanced Society Kristin J. Dana Clinton Andrews Kostas Bekris Jacob Feldman Matthew Stone Pernille Hemmer Aaron Mazzeo Hal Salzman Jingang Yi 13 0 0 27 Oct 2023
Eliciting Model Steering Interactions from Users via Data and Visual Design Probes Anamaria Crisan Maddie Shang Eric Brochu 20 3 0 12 Oct 2023
The Rise of Open Science: Tracking the Evolution and Perceived Value of Data and Methods Link-Sharing Practices Hancheng Cao Jesse Dodge Kyle Lo Daniel A. McFarland Lucy Lu Wang AI4CE 22 5 0 04 Oct 2023
Can large language models provide useful feedback on research papers? A large-scale empirical analysis Weixin Liang Yuhui Zhang Hancheng Cao Binglu Wang Daisy Ding ... Siyu He D. Smith Yian Yin Daniel A. McFarland James Y. Zou ALM LM&MA 40 123 0 03 Oct 2023
RRR-Net: Reusing, Reducing, and Recycling a Deep Backbone Network Haozhe Sun Isabelle M Guyon F. Mohr Hedi Tabia CVBM 17 2 0 02 Oct 2023
Berkeley Open Extended Reality Recordings 2023 (BOXRR-23): 4.7 Million Motion Capture Recordings from 105,852 Extended Reality Device Users V. Nair Wenbo Guo Rui Wang J. F. O'Brien Louis B. Rosenberg Dawn Song 13 7 0 30 Sep 2023
Inferring Capabilities from Task Performance with Bayesian Triangulation John Burden Konstantinos Voudouris Ryan Burnell Danaja Rutar Lucy G. Cheke José Hernández Orallo 16 7 0 21 Sep 2023
FACET: Fairness in Computer Vision Evaluation Benchmark Laura Gustafson Chloe Rolland Nikhila Ravi Quentin Duval Aaron B. Adcock Cheng-Yang Fu Melissa Hall Candace Ross VLM EGVM 16 36 0 31 Aug 2023
Towards Federated Foundation Models: Scalable Dataset Pipelines for Group-Structured Learning Zachary B. Charles Nicole Mitchell Krishna Pillutla Michael Reneer Zachary Garrett FedML AI4CE 28 28 0 18 Jul 2023
Mapping the Challenges of HCI: An Application and Evaluation of ChatGPT and GPT-4 for Mining Insights at Scale Jonas Oppenlaender Joonas Hamalainen 25 6 0 08 Jun 2023
A Diachronic Analysis of Paradigm Shifts in NLP Research: When, How, and Why? Aniket Pramanick Yufang Hou Saif M. Mohammad Iryna Gurevych 14 6 0 22 May 2023
Learning from data with structured missingness R. Mitra Sarah F. McGough Tapabrata (Rohan) Chakraborty Chris Holmes Ryan Copping ... M. Mackintosh E. Andrinopoulou A. Basiri Chris Harbron Ben D. MacArthur CML 11 44 0 04 Apr 2023
A View From Somewhere: Human-Centric Face Representations Jerone T. A. Andrews Przemyslaw K. Joniak Alice Xiang CVBM 11 9 0 30 Mar 2023
Ecosystem Graphs: The Social Footprint of Foundation Models Rishi Bommasani Dilara Soylu Thomas I. Liao Kathleen A. Creel Percy Liang MLAU 27 32 0 28 Mar 2023
CoCon: A Data Set on Combined Contextualized Research Artifact Use T. Saier Youxiang Dong Michael Färber 9 1 0 27 Mar 2023
Aligning benchmark datasets for table structure recognition B. Smock Rohith Pesala Robin Abraham LMTD 14 8 0 01 Mar 2023
Benchmarks for Automated Commonsense Reasoning: A Survey E. Davis ELM LRM 19 57 0 09 Feb 2023
Ethical Considerations for Responsible Data Curation Jerone T. A. Andrews Dora Zhao William Thong Apostolos Modas Orestis Papakyriakopoulos Alice Xiang 17 19 0 07 Feb 2023
LEXTREME: A Multi-Lingual and Multi-Task Benchmark for the Legal Domain Joel Niklaus Veton Matoshi Pooja Rani Andrea Galassi Matthias Sturmer Ilias Chalkidis ELM AILaw 19 54 0 30 Jan 2023
Neural Architecture Search: Insights from 1000 Papers Colin White Mahmoud Safari R. Sukthanker Binxin Ru T. Elsken Arber Zela Debadeepta Dey Frank Hutter 3DV AI4CE 32 128 0 20 Jan 2023
Evaluation for Change Rishi Bommasani ELM 35 0 0 20 Dec 2022
Graph Learning Indexer: A Contributor-Friendly and Metadata-Rich Platform for Graph Learning Benchmarks Jiaqi Ma Xingjian Zhang Hezheng Fan Jin Huang Tianyue Li Tinghong Li Yiwen Tu Chen Zhu Qiaozhu Mei 35 5 0 08 Dec 2022