53

Class-Proportional Coreset Selection for Difficulty-Separable Data

Main:7 Pages
4 Figures
Bibliography:3 Pages
4 Tables
Appendix:1 Pages
Abstract

High-quality training data is essential for building reliable and efficient machine learning systems. One-shot coreset selection addresses this by pruning the dataset while maintaining or even improving model performance, often relying on training-dynamics-based data difficulty scores. However, most existing methods implicitly assume class-wise homogeneity in data difficulty, overlooking variation in data difficulty across different classes.

View on arXiv
Comments on this paper