ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2012.05345
  4. Cited By
Data and its (dis)contents: A survey of dataset development and use in
  machine learning research

Data and its (dis)contents: A survey of dataset development and use in machine learning research

9 December 2020
Amandalynne Paullada
Inioluwa Deborah Raji
Emily M. Bender
Emily L. Denton
A. Hanna
ArXiv (abs)PDFHTML

Papers citing "Data and its (dis)contents: A survey of dataset development and use in machine learning research"

50 / 225 papers shown
Beyond Data Filtering: Knowledge Localization for Capability Removal in LLMs
Beyond Data Filtering: Knowledge Localization for Capability Removal in LLMs
Igor Shilov
Alex Cloud
Aryo Pradipta Gema
Jacob Goldman-Wetzler
Nina Panickssery
Henry Sleight
Erik Jones
Cem Anil
50
3
0
05 Dec 2025
Culture Affordance Atlas: Reconciling Object Diversity Through Functional Mapping
Culture Affordance Atlas: Reconciling Object Diversity Through Functional Mapping
Joan Nwatu
Longju Bai
Oana Ignat
Rada Mihalcea
117
0
0
02 Dec 2025
Synthetic Data: AI's New Weapon Against Android Malware
Synthetic Data: AI's New Weapon Against Android Malware
Angelo Gaspar Diniz Nogueira
K. Paim
Hendrio Braganca
R. Mansilha
Diego Kreutz
AAML
180
1
0
24 Nov 2025
Bias in, Bias out: Annotation Bias in Multilingual Large Language Models
Bias in, Bias out: Annotation Bias in Multilingual Large Language Models
Xia Cui
Ziyi Huang
Naeemeh Adel
109
0
0
18 Nov 2025
AgentExpt: Automating AI Experiment Design with LLM-based Resource Retrieval Agent
AgentExpt: Automating AI Experiment Design with LLM-based Resource Retrieval Agent
Yu-Feng Li
L. Li
Qingmin Liao
Fengli Xu
Yong Li
Yong Li
LM&Ro
234
0
0
07 Nov 2025
Agent Data Protocol: Unifying Datasets for Diverse, Effective Fine-tuning of LLM Agents
Agent Data Protocol: Unifying Datasets for Diverse, Effective Fine-tuning of LLM Agents
Yueqi Song
Ketan Ramaneti
Zaid A. W. Sheikh
Z. Chen
Boyu Gou
...
Xiang Yue
Tao Yu
Huan Sun
Yu-Chuan Su
Graham Neubig
231
5
0
28 Oct 2025
The Benchmarking Epistemology: Construct Validity for Evaluating Machine Learning Models
The Benchmarking Epistemology: Construct Validity for Evaluating Machine Learning Models
Timo Freiesleben
Sebastian Zezulka
154
5
0
27 Oct 2025
Data as a Lever: A Neighbouring Datasets Perspective on Predictive Multiplicity
Data as a Lever: A Neighbouring Datasets Perspective on Predictive Multiplicity
Prakhar Ganesh
Hsiang Hsu
G. Farnadi
148
0
0
24 Oct 2025
Identity-Aware Large Language Models require Cultural Reasoning
Identity-Aware Large Language Models require Cultural Reasoning
Alistair Plum
Anne-Marie Lutgen
Christoph Purschke
Achim Rettinger
LRM
140
5
0
21 Oct 2025
The Digital Mirror: Gender Bias and Occupational Stereotypes in AI-Generated Images
The Digital Mirror: Gender Bias and Occupational Stereotypes in AI-Generated Images
Siiri Leppälampi
Sonja M. Hyrynsalmi
Erno Vanhala
124
1
0
08 Oct 2025
RAISE: A Robot-Assisted Selective Disassembly and Sorting System for End-of-Life Phones
RAISE: A Robot-Assisted Selective Disassembly and Sorting System for End-of-Life Phones
Chang Liu
Badrinath Balasubramaniam
Neal Yancey
Michael Severson
Adam Shine
Philip Bove
Beiwen Li
Xiao Liang
Minghui Zheng
109
10
0
27 Sep 2025
LABELING COPILOT: A Deep Research Agent for Automated Data Curation in Computer Vision
LABELING COPILOT: A Deep Research Agent for Automated Data Curation in Computer Vision
Debargha Ganguly
Sumit Kumar
Ishwar B Balappanawar
Weicong Chen
Shashank Kambhatla
Srinivasan Iyengar
Shivkumar Kalyanaraman
Ponnurangam Kumaraguru
Vipin Chaudhary
VLM
244
2
0
26 Sep 2025
Are You Sure You're Positive? Consolidating Chain-of-Thought Agents with Uncertainty Quantification for Aspect-Category Sentiment Analysis
Are You Sure You're Positive? Consolidating Chain-of-Thought Agents with Uncertainty Quantification for Aspect-Category Sentiment Analysis
Filippos Ventirozos
Peter Appleby
Matthew Shardlow
LLMAG
100
1
0
24 Aug 2025
Beyond Internal Data: Bounding and Estimating Fairness from Incomplete Data
Beyond Internal Data: Bounding and Estimating Fairness from Incomplete Data
Varsha Ramineni
Hossein A. Rahmani
Emine Yilmaz
David Barber
167
0
0
18 Aug 2025
Advancing Data Equity: Practitioner Responsibility and Accountability in NLP Data Practices
Advancing Data Equity: Practitioner Responsibility and Accountability in NLP Data Practices
Jay L. Cunningham
Kevin Zhongyang Shao
Rock Yuren Pang
Nathaniel Mengist
173
0
0
13 Aug 2025
Hallucination to Truth: A Review of Fact-Checking and Factuality Evaluation in Large Language Models
Hallucination to Truth: A Review of Fact-Checking and Factuality Evaluation in Large Language Models
Subhey Sadi Rahman
M. Islam
Md. Mahbub Alam
Musarrat Zeba
M. R
Sadia Sultana Chowa
M. R
Sami Azam
HILMLRM
254
14
0
05 Aug 2025
Beyond Internal Data: Constructing Complete Datasets for Fairness Testing
Beyond Internal Data: Constructing Complete Datasets for Fairness Testing
Varsha Ramineni
Hossein A. Rahmani
Emine Yilmaz
David Barber
195
0
0
24 Jul 2025
PARALLELPROMPT: Extracting Parallelism from Large Language Model Queries
PARALLELPROMPT: Extracting Parallelism from Large Language Model Queries
Steven Kolawole
Keshav Santhanam
Virginia Smith
Pratiksha Thaker
LRM
211
1
0
23 Jun 2025
A Common Pool of Privacy Problems: Legal and Technical Lessons from a Large-Scale Web-Scraped Machine Learning Dataset
A Common Pool of Privacy Problems: Legal and Technical Lessons from a Large-Scale Web-Scraped Machine Learning Dataset
Rachel Hong
Jevan Hutson
William Agnew
Imaad Huda
Tadayoshi Kohno
Jamie Morgenstern
AILawSILMPILM
450
6
0
20 Jun 2025
AI Data Development: A Scorecard for the System Card Framework
AI Data Development: A Scorecard for the System Card Framework
Tadesse K. Bahiru
Haileleol Tibebu
Ioannis A. Kakadiaris
228
2
0
02 Jun 2025
MObyGaze: a film dataset of multimodal objectification densely annotated by experts
MObyGaze: a film dataset of multimodal objectification densely annotated by experts
Julie Tores
Elisa Ancarani
L. Sassatelli
Hui-Yin Wu
Clement Bergman
...
F. Precioso
Thierry Devars
Magali Guaresi
Virginie Julliard
Sarah Lecossais
DiffMVGen
177
1
0
28 May 2025
We Need to Measure Data Diversity in NLP -- Better and Broader
We Need to Measure Data Diversity in NLP -- Better and Broader
Dong Nguyen
Esther Ploeger
392
2
0
26 May 2025
Social Bias in Popular Question-Answering Benchmarks
Social Bias in Popular Question-Answering Benchmarks
Angelie Kraft
Judith Simon
Sonja Schimmler
525
4
0
21 May 2025
Deepfakes on Demand: the rise of accessible non-consensual deepfake image generators
Deepfakes on Demand: the rise of accessible non-consensual deepfake image generatorsConference on Fairness, Accountability and Transparency (FAccT), 2025
Will Hawkins
Chris Russell
Brent Mittelstadt
DiffM
899
19
0
06 May 2025
Investigating the Capabilities and Limitations of Machine Learning for Identifying Bias in English Language Data with Information and Heritage Professionals
Investigating the Capabilities and Limitations of Machine Learning for Identifying Bias in English Language Data with Information and Heritage ProfessionalsInternational Conference on Human Factors in Computing Systems (CHI), 2025
Lucy Havens
Benjamin Bach
Melissa Mhairi Terras
Beatrice Alex
329
2
0
01 Apr 2025
Toward an Evaluation Science for Generative AI Systems
Toward an Evaluation Science for Generative AI Systems
Laura Weidinger
Deb Raji
Hanna M. Wallach
Margaret Mitchell
Angelina Wang
Olawale Salaudeen
Rishi Bommasani
Sayash Kapoor
Deep Ganguli
Sanmi Koyejo
EGVMELM
453
37
0
07 Mar 2025
MONSTER: Monash Scalable Time Series Evaluation Repository
MONSTER: Monash Scalable Time Series Evaluation Repository
Angus Dempster
Navid Mohammadi Foumani
Chang Wei Tan
Lynn Miller
Amish Mishra
Mahsa Salehi
Charlotte Pelletier
Daniel F. Schmidt
G. Webb
AI4TS
359
3
0
24 Feb 2025
Can We Trust AI Benchmarks? An Interdisciplinary Review of Current Issues in AI Evaluation
Can We Trust AI Benchmarks? An Interdisciplinary Review of Current Issues in AI Evaluation
Maria Eriksson
Erasmo Purificato
Arman Noroozian
Joao Vinagre
Guillaume Chaslot
Emilia Gomez
David Fernandez-Llorca
ELM
831
44
0
10 Feb 2025
Large Multimodal Models for Low-Resource Languages: A Survey
Large Multimodal Models for Low-Resource Languages: A Survey
Marian Lupascu
Ana-Cristina Rogoz
Mihai-Sorin Stupariu
Radu Tudor Ionescu
491
4
0
08 Feb 2025
Authenticated Delegation and Authorized AI Agents
Authenticated Delegation and Authorized AI Agents
Tobin South
Samuele Marro
Thomas Hardjono
Robert Mahari
Cedric Deslandes Whitney
Dazza Greenwood
Alan Chan
Alex Pentland
481
34
0
17 Jan 2025
Surveying Attitudinal Alignment Between Large Language Models Vs. Humans Towards 17 Sustainable Development Goals
Surveying Attitudinal Alignment Between Large Language Models Vs. Humans Towards 17 Sustainable Development Goals
Qingyang Wu
Ying Xu
Tingsong Xiao
Yunze Xiao
Yitong Li
...
Yichi Zhang
Shanghai Zhong
Yuwei Zhang
Wei Lu
Yifan Yang
420
9
0
17 Jan 2025
The Evolution of LLM Adoption in Industry Data Curation Practices
The Evolution of LLM Adoption in Industry Data Curation Practices
Crystal Qian
Michael Xieyang Liu
Emily Reif
Grady Simon
Nada Hussein
Nathan Clement
James Wexler
Carrie J. Cai
Michael Terry
Minsuk Kahng
AILawELM
394
10
0
20 Dec 2024
The Evolution and Future Perspectives of Artificial Intelligence Generated Content
The Evolution and Future Perspectives of Artificial Intelligence Generated Content
Chengzhang Zhu
Luobin Cui
Ying Tang
Jiacun Wang
468
4
0
02 Dec 2024
ICT: Image-Object Cross-Level Trusted Intervention for Mitigating Object
  Hallucination in Large Vision-Language Models
ICT: Image-Object Cross-Level Trusted Intervention for Mitigating Object Hallucination in Large Vision-Language ModelsComputer Vision and Pattern Recognition (CVPR), 2024
Junzhe Chen
Tianshu Zhang
Shijie Huang
Yuwei Niu
Linfeng Zhang
Lijie Wen
Xuming Hu
MLLMVLM
1.1K
18
0
22 Nov 2024
Value Imprint: A Technique for Auditing the Human Values Embedded in
  RLHF Datasets
Value Imprint: A Technique for Auditing the Human Values Embedded in RLHF DatasetsNeural Information Processing Systems (NeurIPS), 2024
Ike Obi
Rohan Pant
Srishti Shekhar Agrawal
Maham Ghazanfar
Aaron Basiletti
309
10
0
18 Nov 2024
A Systematic Review of NeurIPS Dataset Management Practices
A Systematic Review of NeurIPS Dataset Management PracticesNeural Information Processing Systems (NeurIPS), 2024
Yiwei Wu
Leah Ajmani
Shayne Longpre
Hanlin Li
269
1
0
31 Oct 2024
Benchmark Data Repositories for Better Benchmarking
Benchmark Data Repositories for Better BenchmarkingNeural Information Processing Systems (NeurIPS), 2024
Rachel Longjohn
Markelle Kelly
Sameer Singh
Padhraic Smyth
298
15
0
31 Oct 2024
Public Domain 12M: A Highly Aesthetic Image-Text Dataset with Novel
  Governance Mechanisms
Public Domain 12M: A Highly Aesthetic Image-Text Dataset with Novel Governance Mechanisms
Jordan Meyer
Nick Padgett
Cullen Miller
Laura Exline
235
16
0
30 Oct 2024
Ethics Whitepaper: Whitepaper on Ethical Research into Large Language
  Models
Ethics Whitepaper: Whitepaper on Ethical Research into Large Language Models
Eddie L. Ungless
Nikolas Vitsakis
Zeerak Talat
James Garforth
Bjorn Ross
Arno Onken
Atoosa Kasirzadeh
Alexandra Birch
371
3
0
17 Oct 2024
Sound Check: Auditing Audio Datasets
Sound Check: Auditing Audio Datasets
William Agnew
Julia Barnett
Annie Chu
Rachel Hong
Michael Feffer
Robin Netzorg
Harry H. Jiang
Ezra Awumey
Sauvik Das
409
2
0
17 Oct 2024
Evaluating Cultural Awareness of LLMs for Yoruba, Malayalam, and English
Evaluating Cultural Awareness of LLMs for Yoruba, Malayalam, and English
Fiifi Dawson
Zainab Mosunmola
Sahil Pocker
Raj Abhijit Dandekar
Rajat Dandekar
Sreedath Panat
327
9
0
14 Sep 2024
Introducing MeMo: A Multimodal Dataset for Memory Modelling in
  Multiparty Conversations
Introducing MeMo: A Multimodal Dataset for Memory Modelling in Multiparty Conversations
Maria Tsfasman
Bernd Dudzik
Kristian Fenech
András Lőrincz
Catholijn M. Jonker
Catharine Oertel
361
3
0
07 Sep 2024
Building Better Datasets: Seven Recommendations for Responsible Design
  from Dataset Creators
Building Better Datasets: Seven Recommendations for Responsible Design from Dataset Creators
Will Orr
Kate Crawford
253
11
0
30 Aug 2024
The Problems with Proxies: Making Data Work Visible through Requester
  Practices
The Problems with Proxies: Making Data Work Visible through Requester PracticesAAAI/ACM Conference on AI, Ethics, and Society (AIES), 2024
Annabel Rothschild
Ding Wang
Niveditha Jayakumar Vilvanathan
Lauren Wilcox
Carl Disalvo
Betsy Disalvo
227
5
0
21 Aug 2024
AI Research is not Magic, it has to be Reproducible and Responsible:
  Challenges in the AI field from the Perspective of its PhD Students
AI Research is not Magic, it has to be Reproducible and Responsible: Challenges in the AI field from the Perspective of its PhD Students
Andrea Hrckova
Jennifer Renoux
Rafael Tolosana Calasanz
Daniela Chuda
Martin Tamajka
Jakub Simko
147
0
0
13 Aug 2024
The Data Addition Dilemma
The Data Addition DilemmaMachine Learning in Health Care (MLHC), 2024
Judy Hanwen Shen
Inioluwa Deborah Raji
Irene Y. Chen
366
18
0
08 Aug 2024
To which reference class do you belong? Measuring racial fairness of reference classes with normative modeling
To which reference class do you belong? Measuring racial fairness of reference classes with normative modeling
S. Rutherford
T. Wolfers
Charlotte J. Fraza
Nathaniel G. Harrnet
Christian F. Beckmann
H. Ruhé
A. Marquand
CML
473
12
0
26 Jul 2024
Consent in Crisis: The Rapid Decline of the AI Data Commons
Consent in Crisis: The Rapid Decline of the AI Data Commons
Shayne Longpre
Robert Mahari
Ariel N. Lee
Campbell Lund
Hamidah Oderinwale
...
Hanlin Li
Daphne Ippolito
Sara Hooker
Jad Kabbara
Sandy Pentland
450
70
0
20 Jul 2024
Is That Rain? Understanding Effects on Visual Odometry Performance for Autonomous UAVs and Efficient DNN-based Rain Classification at the Edge
Is That Rain? Understanding Effects on Visual Odometry Performance for Autonomous UAVs and Efficient DNN-based Rain Classification at the Edge
Andrea Albanese
Yanran Wang
Davide Brunelli
David E. Boyle
422
1
0
17 Jul 2024
Position: Measure Dataset Diversity, Don't Just Claim It
Position: Measure Dataset Diversity, Don't Just Claim It
Dora Zhao
Jerone T. A. Andrews
Orestis Papakyriakopoulos
Alice Xiang
336
34
0
11 Jul 2024
12345
Next
Page 1 of 5