781
v1v2v3v4v5 (latest)

A Comprehensive Study of Shapley Value in Data Analytics

Proceedings of the VLDB Endowment (PVLDB), 2024
Main:14 Pages
12 Figures
Bibliography:5 Pages
11 Tables
Abstract

Over the recent years, Shapley value (SV), a solution concept from cooperative game theory, has found numerous applications in data analytics (DA). This paper provides the first comprehensive study of SV used throughout the DA workflow, clarifying the key variables in defining DA-applicable SV and the essential functionalities that SV can provide for data scientists. We condense four primary challenges of using SV in DA, namely computation efficiency, approximation error, privacy preservation, and interpretability, then disentangle the resolution techniques from existing arts in this field, analyze and discuss the techniques w.r.t. each challenge and potential conflicts between challenges. We also implement SVBench, a modular and extensible open-sourced framework for developing SV applications in different DA tasks, and conduct extensive evaluations to validate our analyses and discussions. Based on the qualitative and quantitative results, we identify the limitations of current efforts for applying SV to DA and highlight the directions of future research and engineering.

View on arXiv
Comments on this paper