ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2303.10158
  4. Cited By
Data-centric Artificial Intelligence: A Survey
v1v2v3 (latest)

Data-centric Artificial Intelligence: A Survey

ACM Computing Surveys (ACM Comput. Surv.), 2023
17 March 2023
Daochen Zha
Zaid Pervaiz Bhat
Kwei-Herng Lai
Fan Yang
Zhimeng Jiang
Shaochen Zhong
Helen Zhou
ArXiv (abs)PDFHTMLHuggingFace (1 upvotes)Github (1107★)

Papers citing "Data-centric Artificial Intelligence: A Survey"

50 / 65 papers shown
Title
"AGI" team at SHROOM-CAP: Data-Centric Approach to Multilingual Hallucination Detection using XLM-RoBERTa
"AGI" team at SHROOM-CAP: Data-Centric Approach to Multilingual Hallucination Detection using XLM-RoBERTa
Harsh Rathva
Pruthwik Mishra
Shrikant Malviya
HILM
30
0
0
23 Nov 2025
What's the next frontier for Data-centric AI? Data Savvy Agents
What's the next frontier for Data-centric AI? Data Savvy Agents
Nabeel Seedat
Jiashuo Liu
Mihaela van der Schaar
96
0
0
02 Nov 2025
MalDataGen: A Modular Framework for Synthetic Tabular Data Generation in Malware Detection
MalDataGen: A Modular Framework for Synthetic Tabular Data Generation in Malware Detection
K. Paim
Angelo Gaspar Diniz Nogueira
Diego Kreutz
Weverton Cordeiro
R. Mansilha
36
1
0
01 Nov 2025
Accumulative SGD Influence Estimation for Data Attribution
Accumulative SGD Influence Estimation for Data Attribution
Yunxiao Shi
Shuo Yang
Yixin Su
Rui-Xun Zhang
Min Xu
TDI
195
0
0
30 Oct 2025
Agent Data Protocol: Unifying Datasets for Diverse, Effective Fine-tuning of LLM Agents
Agent Data Protocol: Unifying Datasets for Diverse, Effective Fine-tuning of LLM Agents
Yueqi Song
Ketan Ramaneti
Zaid A. W. Sheikh
Z. Chen
Boyu Gou
...
Xiang Yue
Tao Yu
Huan Sun
Yu-Chuan Su
Graham Neubig
152
0
0
28 Oct 2025
Filtering instances and rejecting predictions to obtain reliable models in healthcare
Filtering instances and rejecting predictions to obtain reliable models in healthcare
Maria Gabriela Valeriano
David Kohan Marzagão
Alfredo Montelongo
Carlos Roberto Veiga Kiffer
Natan Katz
A. C. Lorena
62
0
0
28 Oct 2025
Reliability of Large Language Model Generated Clinical Reasoning in Assisted Reproductive Technology: Blinded Comparative Evaluation Study
Reliability of Large Language Model Generated Clinical Reasoning in Assisted Reproductive Technology: Blinded Comparative Evaluation Study
Dou Liu
Ying Long
Sophia Zuoqiu
Di Liu
Kang Li
Yiting Lin
Hanyi Liu
Rong Yin
Tian Tang
ELM
113
1
0
17 Oct 2025
Combining Discrepancy-Confusion Uncertainty and Calibration Diversity for Active Fine-Grained Image Classification
Combining Discrepancy-Confusion Uncertainty and Calibration Diversity for Active Fine-Grained Image Classification
Yinghao Jin
Xi Yang
UQCV
172
0
0
29 Sep 2025
Optimizing Class Distributions for Bias-Aware Multi-Class Learning
Optimizing Class Distributions for Bias-Aware Multi-Class Learning
Mirco Felske
Stefan Stiene
96
0
0
15 Sep 2025
MEGG: Replay via Maximally Extreme GGscore in Incremental Learning for Neural Recommendation Models
MEGG: Replay via Maximally Extreme GGscore in Incremental Learning for Neural Recommendation Models
Yunxiao Shi
Shuo Yang
Haimin Zhang
Li Wang
Yongze Wang
Qiang Wu
Min Xu
100
1
0
09 Sep 2025
Distribution Shift Aware Neural Tabular Learning
Distribution Shift Aware Neural Tabular Learning
Wangyang Ying
Nanxu Gong
Dongjie Wang
Xinyuan Wang
Arun Vignesh Malarkkan
Vivek Gupta
Chandan K. Reddy
Yanjie Fu
OOD
170
3
0
27 Aug 2025
Chunked Data Shapley: A Scalable Dataset Quality Assessment for Machine Learning
Chunked Data Shapley: A Scalable Dataset Quality Assessment for Machine Learning
Andreas Loizou
Dimitrios Tsoumakos
TDI
136
0
0
22 Aug 2025
Defining and Benchmarking a Data-Centric Design Space for Brain Graph Construction
Defining and Benchmarking a Data-Centric Design Space for Brain Graph Construction
Qinwen Ge
Roza G. Bayrak
Anwar Said
Catie Chang
X. Koutsoukos
Tyler Derr
64
0
0
17 Aug 2025
OpenConstruction: A Systematic Synthesis of Open Visual Datasets for Data-Centric Artificial Intelligence in Construction Monitoring
OpenConstruction: A Systematic Synthesis of Open Visual Datasets for Data-Centric Artificial Intelligence in Construction Monitoring
Ruoxin Xiong
Yanyu Wang
Jiannan Cai
Kaijian Liu
Yuansheng Zhu
P. Tang
Nora El-Gohary
3DVAI4TS
64
0
0
15 Aug 2025
LoSemB: Logic-Guided Semantic Bridging for Inductive Tool Retrieval
LoSemB: Logic-Guided Semantic Bridging for Inductive Tool Retrieval
Luyao Zhuang
Qinggang Zhang
Huachi Zhou
Juhua Liu
Qing Li
Xiao Huang
RALMKELM
87
1
0
11 Aug 2025
Empowering Time Series Forecasting with LLM-Agents
Empowering Time Series Forecasting with LLM-Agents
Chin-Chia Michael Yeh
Vivian Lai
Uday Singh Saini
Xiran Fan
Yujie Fan
Junpeng Wang
Xin Dai
Yan Zheng
AI4TSLLMAGAIFinAI4CE
210
3
0
06 Aug 2025
Feature Shift Localization Network
Feature Shift Localization Network
Míriam Barrabés
D. M. Montserrat
Kapal Dev
A. Ioannidis
OOD
156
0
0
10 Jun 2025
Enhancing Orthopox Image Classification Using Hybrid Machine Learning and Deep Learning Models
Enhancing Orthopox Image Classification Using Hybrid Machine Learning and Deep Learning Models
Alejandro Puente-Castro
Enrique Fernández-Blanco
Daniel Rivero
Andres Molares-Ulloa
113
0
0
06 Jun 2025
Collaborative Multi-Agent Reinforcement Learning for Automated Feature Transformation with Graph-Driven Path Optimization
Collaborative Multi-Agent Reinforcement Learning for Automated Feature Transformation with Graph-Driven Path Optimization
Xiaohan Huang
Dongjie Wang
Zhiyuan Ning
Ziyue Qiao
Qingqing Long
Haowei Zhu
Yi Du
Min-Ying Wu
Yuanchun Zhou
Meng Xiao
394
3
0
24 Apr 2025
Global Renewables Watch: A Temporal Dataset of Solar and Wind Energy Derived from Satellite Imagery
Global Renewables Watch: A Temporal Dataset of Solar and Wind Energy Derived from Satellite Imagery
Caleb Robinson
Anthony Ortiz
Allen Kim
Rahul Dodhia
Andrew Zolli
Shivaprakash K. Nagaraju
J. O
J. Kiesecker
J. L. Ferres
210
3
0
19 Mar 2025
Automatic quality control in multi-centric fetal brain MRI super-resolution reconstruction
Automatic quality control in multi-centric fetal brain MRI super-resolution reconstruction
Thomas Sanchez
Vladyslav Zalevsky
Angeline Mihailo
Gerard Martí Juan
E. Eixarch
Andras Jakab
Vincent Dunet
Mériam Koob
G. Auzias
Meritxell Bach Cuadra
269
0
0
13 Mar 2025
The Algorithmic State Architecture (ASA): An Integrated Framework for AI-Enabled Government
The Algorithmic State Architecture (ASA): An Integrated Framework for AI-Enabled Government
Zeynep Engin
Jon Crowcroft
David Hand
Philip Treleaven
276
4
0
11 Mar 2025
Empowering Edge Intelligence: A Comprehensive Survey on On-Device AI Models
Empowering Edge Intelligence: A Comprehensive Survey on On-Device AI ModelsACM Computing Surveys (ACM Comput. Surv.), 2025
Xubin Wang
Zhiqing Tang
Jianxiong Guo
Tianhui Meng
Chenhao Wang
Tian-sheng Wang
Weijia Jia
342
49
0
08 Mar 2025
EDCA - An Evolutionary Data-Centric AutoML Framework for Efficient Pipelines
EDCA - An Evolutionary Data-Centric AutoML Framework for Efficient Pipelines
Joana Simões
João Correia
856
1
0
06 Mar 2025
Analytics Modelling over Multiple Datasets using Vector Embeddings
Analytics Modelling over Multiple Datasets using Vector EmbeddingsInternational Conference on Database and Expert Systems Applications (DEXA), 2025
Andreas Loizou
Dimitrios Tsoumakos
372
0
0
24 Feb 2025
Is This Collection Worth My LLM's Time? Automatically Measuring Information Potential in Text Corpora
Is This Collection Worth My LLM's Time? Automatically Measuring Information Potential in Text Corpora
Tristan Karch
Luca Engel
Philippe Schwaller
Frédéric Kaplan
277
0
0
19 Feb 2025
Revisiting Rogers' Paradox in the Context of Human-AI Interaction
Revisiting Rogers' Paradox in the Context of Human-AI Interaction
Katherine M. Collins
Umang Bhatt
Ilia Sucholutsky
306
2
0
16 Jan 2025
Interpolation pour láugmentation de donnees : Application \`a la gestion des adventices de la canne a sucre a la Reunion
Interpolation pour láugmentation de donnees : Application \`a la gestion des adventices de la canne a sucre a la Reunion
Frédérick Fabre Ferber
Dominique Gay
Jean-Christophe Soulié
Jean Diatta
Odalric-Ambrym Maillard
119
0
0
10 Jan 2025
Cognitive Edge Computing: A Comprehensive Survey on Optimizing Large Models and AI Agents for Pervasive Deployment
Cognitive Edge Computing: A Comprehensive Survey on Optimizing Large Models and AI Agents for Pervasive DeploymentInternational Conference on Artificial Neural Networks (ICANN), 2025
Xubin Wang
Weijia Jia
Weijia Jia
413
21
0
04 Jan 2025
General Information Metrics for Improving AI Model Training EfficiencyArtificial Intelligence Review (AIR), 2025
Jianfeng Xu
Congcong Liu
Xiaoying Tan
Xiaojie Zhu
Anpeng Wu
...
Weijun Kong
Chun Li
Hu Xu
Kun Kuang
Leilei Gan
304
3
0
02 Jan 2025
Unleashing the Power of Data Tsunami: A Comprehensive Survey on Data Assessment and Selection for Instruction Tuning of Language Models
Unleashing the Power of Data Tsunami: A Comprehensive Survey on Data Assessment and Selection for Instruction Tuning of Language Models
Yulei Qin
Yuncheng Yang
Pengcheng Guo
Gang Li
Hang Shao
Yuchen Shi
Zihan Xu
Yun Gu
Ke Li
Xing Sun
ALM
659
21
0
31 Dec 2024
Data Quality Control in Federated Instruction-tuning of Large Language Models
Data Quality Control in Federated Instruction-tuning of Large Language Models
Yaxin Du
Guangyi Liu
Fengting Yuchi
W. Zhao
Jingjing Qu
Yanjie Wang
Siheng Chen
ALMFedML
257
3
0
15 Oct 2024
Federated Data-Efficient Instruction Tuning for Large Language Models
Federated Data-Efficient Instruction Tuning for Large Language ModelsAnnual Meeting of the Association for Computational Linguistics (ACL), 2024
Zhen Qin
Zhaomin Wu
Bingsheng He
Shuiguang Deng
FedML
286
3
0
14 Oct 2024
Scrambled text: training Language Models to correct OCR errors using
  synthetic data
Scrambled text: training Language Models to correct OCR errors using synthetic data
Jonathan Bourne
SyDa
193
3
0
29 Sep 2024
AdapFair: Ensuring Adaptive Fairness for Machine Learning Operations
AdapFair: Ensuring Adaptive Fairness for Machine Learning Operations
Yinghui Huang
Zihao Tang
Xiangyu Chang
FaML
172
0
0
23 Sep 2024
AceParse: A Comprehensive Dataset with Diverse Structured Texts for Academic Literature Parsing
AceParse: A Comprehensive Dataset with Diverse Structured Texts for Academic Literature ParsingIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2024
Huawei Ji
Cheng Deng
Bo Xue
Zhouyang Jin
Jiaxin Ding
Xiaoying Gan
Luoyi Fu
Xinbing Wang
Chenghu Zhou
155
0
0
16 Sep 2024
A Survey on Data Quality Dimensions and Tools for Machine Learning
A Survey on Data Quality Dimensions and Tools for Machine Learning
Yuhan Zhou
Fengjiao Tu
Kewei Sha
Junhua Ding
Haihua Chen
174
14
0
28 Jun 2024
A Survey of Multimodal Large Language Model from A Data-centric
  Perspective
A Survey of Multimodal Large Language Model from A Data-centric Perspective
Tianyi Bai
Hao Liang
Binwang Wan
Yanran Xu
Xi Li
...
Ping Huang
Jiulong Shan
Conghui He
Binhang Yuan
Wentao Zhang
295
60
0
26 May 2024
Representation Debiasing of Generated Data Involving Domain Experts
Representation Debiasing of Generated Data Involving Domain ExpertsUser Modeling, Adaptation, and Personalization (UMAP), 2024
Aditya Bhattacharya
Simone Stumpf
K. Verbert
140
4
0
17 May 2024
A Comprehensive Survey on Data Augmentation
A Comprehensive Survey on Data AugmentationIEEE Transactions on Knowledge and Data Engineering (TKDE), 2024
Zaitian Wang
Pengfei Wang
Kunpeng Liu
Pengyang Wang
Yanjie Fu
Chang-Tien Lu
Charu Aggarwal
Jian Pei
Yuanchun Zhou
ViT
485
64
0
15 May 2024
Large Language Models for Cyber Security: A Systematic Literature Review
Large Language Models for Cyber Security: A Systematic Literature Review
HanXiang Xu
Shenao Wang
Ningke Li
Kaidi Wang
Yanjie Zhao
Kai Chen
Ting Yu
Yang Liu
Haoyu Wang
506
95
0
08 May 2024
Kernel Corrector LSTM
Kernel Corrector LSTM
Rodrigo Tuna
Yassine Baghoussi
Carlos Soares
João Mendes-Moreira
KELMAI4TS
86
0
0
28 Apr 2024
An In-Depth Analysis of Data Reduction Methods for Sustainable Deep
  Learning
An In-Depth Analysis of Data Reduction Methods for Sustainable Deep LearningOpen Research Europe (ORE), 2024
Víctor Toscano-Durán
Javier Perera-Lago
Eduardo Paluzo-Hidalgo
Rocio Gonzalez-Diaz
Miguel A. Gutiérrez-Naranjo
Matteo Rucco
168
3
0
22 Mar 2024
Towards Urban General Intelligence: A Review and Outlook of Urban Foundation Models
Towards Urban General Intelligence: A Review and Outlook of Urban Foundation Models
Weijiao Zhang
Jindong Han
Zhao Xu
Hang Ni
Hao Liu
Hui Xiong
Hui Xiong
AI4CE
470
24
0
30 Jan 2024
README: Bridging Medical Jargon and Lay Understanding for Patient
  Education through Data-Centric NLP
README: Bridging Medical Jargon and Lay Understanding for Patient Education through Data-Centric NLP
Zonghai Yao
Nandyala Siddharth Kantu
Guanghao Wei
Hieu Tran
Zhangqi Duan
Sunjae Kwon
Zhichao Yang
Readme annotation team
Hong-ye Yu
247
13
0
24 Dec 2023
KnowGPT: Knowledge Graph based Prompting for Large Language Models
KnowGPT: Knowledge Graph based Prompting for Large Language Models
Qinggang Zhang
Hao-Heng Chen
Hao Chen
Daochen Zha
Zailiang Yu
Xiao Huang
KELMRALM
316
30
0
11 Dec 2023
The Efficiency Spectrum of Large Language Models: An Algorithmic Survey
The Efficiency Spectrum of Large Language Models: An Algorithmic Survey
Tianyu Ding
Tianyi Chen
Haidong Zhu
Jiachen Jiang
Yiqi Zhong
Jinxin Zhou
Guangzhi Wang
Zhihui Zhu
Ilya Zharkov
Luming Liang
308
32
0
01 Dec 2023
Understanding Fairness Surrogate Functions in Algorithmic Fairness
Understanding Fairness Surrogate Functions in Algorithmic Fairness
Wei Yao
Zhanke Zhou
Zhicong Li
Bo Han
Yong Liu
253
7
0
17 Oct 2023
Towards Deep Learning Models Resistant to Transfer-based Adversarial
  Attacks via Data-centric Robust Learning
Towards Deep Learning Models Resistant to Transfer-based Adversarial Attacks via Data-centric Robust Learning
Yulong Yang
Chenhao Lin
Xiang Ji
Qiwei Tian
Qian Li
Hongshan Yang
Zhibo Wang
Chao Shen
166
7
0
15 Oct 2023
CODA: Temporal Domain Generalization via Concept Drift Simulator
CODA: Temporal Domain Generalization via Concept Drift SimulatorKnowledge Discovery and Data Mining (KDD), 2023
Chia-Yuan Chang
Yu-Neng Chuang
Zhimeng Jiang
Kwei-Herng Lai
Anxiao Jiang
Na Zou
OOD
135
6
0
02 Oct 2023
12
Next