Over the recent years, Shapley value (SV), a solution concept from cooperative game theory, has found numerous applications in data analytics (DA). This paper provides the first comprehensive study of SV used throughout the DA workflow, clarifying the key variables in defining DA-applicable SV and the essential functionalities that SV can provide for data scientists. We condense four primary challenges of using SV in DA, namely computation efficiency, approximation error, privacy preservation, and interpretability, then disentangle the resolution techniques from existing arts in this field, analyze and discuss the techniques w.r.t. each challenge and potential conflicts between challenges. We also implement SVBench, a modular and extensible open-sourced framework for developing SV applications in different DA tasks, and conduct extensive evaluations to validate our analyses and discussions. Based on the qualitative and quantitative results, we identify the limitations of current efforts for applying SV to DA and highlight the directions of future research and engineering.
View on arXiv@article{lin2025_2412.01460, title={ A Comprehensive Study of Shapley Value in Data Analytics }, author={ Hong Lin and Shixin Wan and Zhongle Xie and Ke Chen and Meihui Zhang and Lidan Shou and Gang Chen }, journal={arXiv preprint arXiv:2412.01460}, year={ 2025 } }