Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
2303.10158
Cited By
v1
v2
v3 (latest)
Data-centric Artificial Intelligence: A Survey
ACM Computing Surveys (ACM Comput. Surv.), 2023
17 March 2023
Daochen Zha
Zaid Pervaiz Bhat
Kwei-Herng Lai
Fan Yang
Zhimeng Jiang
Shaochen Zhong
Helen Zhou
Re-assign community
ArXiv (abs)
PDF
HTML
HuggingFace (1 upvotes)
Github (1107★)
Papers citing
"Data-centric Artificial Intelligence: A Survey"
50 / 65 papers shown
Title
"AGI" team at SHROOM-CAP: Data-Centric Approach to Multilingual Hallucination Detection using XLM-RoBERTa
Harsh Rathva
Pruthwik Mishra
Shrikant Malviya
HILM
30
0
0
23 Nov 2025
What's the next frontier for Data-centric AI? Data Savvy Agents
Nabeel Seedat
Jiashuo Liu
Mihaela van der Schaar
96
0
0
02 Nov 2025
MalDataGen: A Modular Framework for Synthetic Tabular Data Generation in Malware Detection
K. Paim
Angelo Gaspar Diniz Nogueira
Diego Kreutz
Weverton Cordeiro
R. Mansilha
36
1
0
01 Nov 2025
Accumulative SGD Influence Estimation for Data Attribution
Yunxiao Shi
Shuo Yang
Yixin Su
Rui-Xun Zhang
Min Xu
TDI
195
0
0
30 Oct 2025
Agent Data Protocol: Unifying Datasets for Diverse, Effective Fine-tuning of LLM Agents
Yueqi Song
Ketan Ramaneti
Zaid A. W. Sheikh
Z. Chen
Boyu Gou
...
Xiang Yue
Tao Yu
Huan Sun
Yu-Chuan Su
Graham Neubig
152
0
0
28 Oct 2025
Filtering instances and rejecting predictions to obtain reliable models in healthcare
Maria Gabriela Valeriano
David Kohan Marzagão
Alfredo Montelongo
Carlos Roberto Veiga Kiffer
Natan Katz
A. C. Lorena
62
0
0
28 Oct 2025
Reliability of Large Language Model Generated Clinical Reasoning in Assisted Reproductive Technology: Blinded Comparative Evaluation Study
Dou Liu
Ying Long
Sophia Zuoqiu
Di Liu
Kang Li
Yiting Lin
Hanyi Liu
Rong Yin
Tian Tang
ELM
113
1
0
17 Oct 2025
Combining Discrepancy-Confusion Uncertainty and Calibration Diversity for Active Fine-Grained Image Classification
Yinghao Jin
Xi Yang
UQCV
172
0
0
29 Sep 2025
Optimizing Class Distributions for Bias-Aware Multi-Class Learning
Mirco Felske
Stefan Stiene
96
0
0
15 Sep 2025
MEGG: Replay via Maximally Extreme GGscore in Incremental Learning for Neural Recommendation Models
Yunxiao Shi
Shuo Yang
Haimin Zhang
Li Wang
Yongze Wang
Qiang Wu
Min Xu
100
1
0
09 Sep 2025
Distribution Shift Aware Neural Tabular Learning
Wangyang Ying
Nanxu Gong
Dongjie Wang
Xinyuan Wang
Arun Vignesh Malarkkan
Vivek Gupta
Chandan K. Reddy
Yanjie Fu
OOD
170
3
0
27 Aug 2025
Chunked Data Shapley: A Scalable Dataset Quality Assessment for Machine Learning
Andreas Loizou
Dimitrios Tsoumakos
TDI
136
0
0
22 Aug 2025
Defining and Benchmarking a Data-Centric Design Space for Brain Graph Construction
Qinwen Ge
Roza G. Bayrak
Anwar Said
Catie Chang
X. Koutsoukos
Tyler Derr
64
0
0
17 Aug 2025
OpenConstruction: A Systematic Synthesis of Open Visual Datasets for Data-Centric Artificial Intelligence in Construction Monitoring
Ruoxin Xiong
Yanyu Wang
Jiannan Cai
Kaijian Liu
Yuansheng Zhu
P. Tang
Nora El-Gohary
3DV
AI4TS
64
0
0
15 Aug 2025
LoSemB: Logic-Guided Semantic Bridging for Inductive Tool Retrieval
Luyao Zhuang
Qinggang Zhang
Huachi Zhou
Juhua Liu
Qing Li
Xiao Huang
RALM
KELM
87
1
0
11 Aug 2025
Empowering Time Series Forecasting with LLM-Agents
Chin-Chia Michael Yeh
Vivian Lai
Uday Singh Saini
Xiran Fan
Yujie Fan
Junpeng Wang
Xin Dai
Yan Zheng
AI4TS
LLMAG
AIFin
AI4CE
210
3
0
06 Aug 2025
Feature Shift Localization Network
Míriam Barrabés
D. M. Montserrat
Kapal Dev
A. Ioannidis
OOD
156
0
0
10 Jun 2025
Enhancing Orthopox Image Classification Using Hybrid Machine Learning and Deep Learning Models
Alejandro Puente-Castro
Enrique Fernández-Blanco
Daniel Rivero
Andres Molares-Ulloa
113
0
0
06 Jun 2025
Collaborative Multi-Agent Reinforcement Learning for Automated Feature Transformation with Graph-Driven Path Optimization
Xiaohan Huang
Dongjie Wang
Zhiyuan Ning
Ziyue Qiao
Qingqing Long
Haowei Zhu
Yi Du
Min-Ying Wu
Yuanchun Zhou
Meng Xiao
394
3
0
24 Apr 2025
Global Renewables Watch: A Temporal Dataset of Solar and Wind Energy Derived from Satellite Imagery
Caleb Robinson
Anthony Ortiz
Allen Kim
Rahul Dodhia
Andrew Zolli
Shivaprakash K. Nagaraju
J. O
J. Kiesecker
J. L. Ferres
210
3
0
19 Mar 2025
Automatic quality control in multi-centric fetal brain MRI super-resolution reconstruction
Thomas Sanchez
Vladyslav Zalevsky
Angeline Mihailo
Gerard Martí Juan
E. Eixarch
Andras Jakab
Vincent Dunet
Mériam Koob
G. Auzias
Meritxell Bach Cuadra
269
0
0
13 Mar 2025
The Algorithmic State Architecture (ASA): An Integrated Framework for AI-Enabled Government
Zeynep Engin
Jon Crowcroft
David Hand
Philip Treleaven
276
4
0
11 Mar 2025
Empowering Edge Intelligence: A Comprehensive Survey on On-Device AI Models
ACM Computing Surveys (ACM Comput. Surv.), 2025
Xubin Wang
Zhiqing Tang
Jianxiong Guo
Tianhui Meng
Chenhao Wang
Tian-sheng Wang
Weijia Jia
342
49
0
08 Mar 2025
EDCA - An Evolutionary Data-Centric AutoML Framework for Efficient Pipelines
Joana Simões
João Correia
856
1
0
06 Mar 2025
Analytics Modelling over Multiple Datasets using Vector Embeddings
International Conference on Database and Expert Systems Applications (DEXA), 2025
Andreas Loizou
Dimitrios Tsoumakos
372
0
0
24 Feb 2025
Is This Collection Worth My LLM's Time? Automatically Measuring Information Potential in Text Corpora
Tristan Karch
Luca Engel
Philippe Schwaller
Frédéric Kaplan
277
0
0
19 Feb 2025
Revisiting Rogers' Paradox in the Context of Human-AI Interaction
Katherine M. Collins
Umang Bhatt
Ilia Sucholutsky
306
2
0
16 Jan 2025
Interpolation pour láugmentation de donnees : Application \`a la gestion des adventices de la canne a sucre a la Reunion
Frédérick Fabre Ferber
Dominique Gay
Jean-Christophe Soulié
Jean Diatta
Odalric-Ambrym Maillard
119
0
0
10 Jan 2025
Cognitive Edge Computing: A Comprehensive Survey on Optimizing Large Models and AI Agents for Pervasive Deployment
International Conference on Artificial Neural Networks (ICANN), 2025
Xubin Wang
Weijia Jia
Weijia Jia
413
21
0
04 Jan 2025
General Information Metrics for Improving AI Model Training Efficiency
Artificial Intelligence Review (AIR), 2025
Jianfeng Xu
Congcong Liu
Xiaoying Tan
Xiaojie Zhu
Anpeng Wu
...
Weijun Kong
Chun Li
Hu Xu
Kun Kuang
Leilei Gan
304
3
0
02 Jan 2025
Unleashing the Power of Data Tsunami: A Comprehensive Survey on Data Assessment and Selection for Instruction Tuning of Language Models
Yulei Qin
Yuncheng Yang
Pengcheng Guo
Gang Li
Hang Shao
Yuchen Shi
Zihan Xu
Yun Gu
Ke Li
Xing Sun
ALM
659
21
0
31 Dec 2024
Data Quality Control in Federated Instruction-tuning of Large Language Models
Yaxin Du
Guangyi Liu
Fengting Yuchi
W. Zhao
Jingjing Qu
Yanjie Wang
Siheng Chen
ALM
FedML
257
3
0
15 Oct 2024
Federated Data-Efficient Instruction Tuning for Large Language Models
Annual Meeting of the Association for Computational Linguistics (ACL), 2024
Zhen Qin
Zhaomin Wu
Bingsheng He
Shuiguang Deng
FedML
286
3
0
14 Oct 2024
Scrambled text: training Language Models to correct OCR errors using synthetic data
Jonathan Bourne
SyDa
193
3
0
29 Sep 2024
AdapFair: Ensuring Adaptive Fairness for Machine Learning Operations
Yinghui Huang
Zihao Tang
Xiangyu Chang
FaML
172
0
0
23 Sep 2024
AceParse: A Comprehensive Dataset with Diverse Structured Texts for Academic Literature Parsing
IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2024
Huawei Ji
Cheng Deng
Bo Xue
Zhouyang Jin
Jiaxin Ding
Xiaoying Gan
Luoyi Fu
Xinbing Wang
Chenghu Zhou
155
0
0
16 Sep 2024
A Survey on Data Quality Dimensions and Tools for Machine Learning
Yuhan Zhou
Fengjiao Tu
Kewei Sha
Junhua Ding
Haihua Chen
174
14
0
28 Jun 2024
A Survey of Multimodal Large Language Model from A Data-centric Perspective
Tianyi Bai
Hao Liang
Binwang Wan
Yanran Xu
Xi Li
...
Ping Huang
Jiulong Shan
Conghui He
Binhang Yuan
Wentao Zhang
295
60
0
26 May 2024
Representation Debiasing of Generated Data Involving Domain Experts
User Modeling, Adaptation, and Personalization (UMAP), 2024
Aditya Bhattacharya
Simone Stumpf
K. Verbert
140
4
0
17 May 2024
A Comprehensive Survey on Data Augmentation
IEEE Transactions on Knowledge and Data Engineering (TKDE), 2024
Zaitian Wang
Pengfei Wang
Kunpeng Liu
Pengyang Wang
Yanjie Fu
Chang-Tien Lu
Charu Aggarwal
Jian Pei
Yuanchun Zhou
ViT
485
64
0
15 May 2024
Large Language Models for Cyber Security: A Systematic Literature Review
HanXiang Xu
Shenao Wang
Ningke Li
Kaidi Wang
Yanjie Zhao
Kai Chen
Ting Yu
Yang Liu
Haoyu Wang
506
95
0
08 May 2024
Kernel Corrector LSTM
Rodrigo Tuna
Yassine Baghoussi
Carlos Soares
João Mendes-Moreira
KELM
AI4TS
86
0
0
28 Apr 2024
An In-Depth Analysis of Data Reduction Methods for Sustainable Deep Learning
Open Research Europe (ORE), 2024
Víctor Toscano-Durán
Javier Perera-Lago
Eduardo Paluzo-Hidalgo
Rocio Gonzalez-Diaz
Miguel A. Gutiérrez-Naranjo
Matteo Rucco
168
3
0
22 Mar 2024
Towards Urban General Intelligence: A Review and Outlook of Urban Foundation Models
Weijiao Zhang
Jindong Han
Zhao Xu
Hang Ni
Hao Liu
Hui Xiong
Hui Xiong
AI4CE
470
24
0
30 Jan 2024
README: Bridging Medical Jargon and Lay Understanding for Patient Education through Data-Centric NLP
Zonghai Yao
Nandyala Siddharth Kantu
Guanghao Wei
Hieu Tran
Zhangqi Duan
Sunjae Kwon
Zhichao Yang
Readme annotation team
Hong-ye Yu
247
13
0
24 Dec 2023
KnowGPT: Knowledge Graph based Prompting for Large Language Models
Qinggang Zhang
Hao-Heng Chen
Hao Chen
Daochen Zha
Zailiang Yu
Xiao Huang
KELM
RALM
316
30
0
11 Dec 2023
The Efficiency Spectrum of Large Language Models: An Algorithmic Survey
Tianyu Ding
Tianyi Chen
Haidong Zhu
Jiachen Jiang
Yiqi Zhong
Jinxin Zhou
Guangzhi Wang
Zhihui Zhu
Ilya Zharkov
Luming Liang
308
32
0
01 Dec 2023
Understanding Fairness Surrogate Functions in Algorithmic Fairness
Wei Yao
Zhanke Zhou
Zhicong Li
Bo Han
Yong Liu
253
7
0
17 Oct 2023
Towards Deep Learning Models Resistant to Transfer-based Adversarial Attacks via Data-centric Robust Learning
Yulong Yang
Chenhao Lin
Xiang Ji
Qiwei Tian
Qian Li
Hongshan Yang
Zhibo Wang
Chao Shen
166
7
0
15 Oct 2023
CODA: Temporal Domain Generalization via Concept Drift Simulator
Knowledge Discovery and Data Mining (KDD), 2023
Chia-Yuan Chang
Yu-Neng Chuang
Zhimeng Jiang
Kwei-Herng Lai
Anxiao Jiang
Na Zou
OOD
135
6
0
02 Oct 2023
1
2
Next