127
157

Understanding Dataset Difficulty with V\mathcal{V}-Usable Information

Abstract

Estimating the difficulty of a dataset typically involves comparing state-of-the-art models to humans; the bigger the performance gap, the harder the dataset is said to be. However, this comparison provides little understanding of how difficult each instance in a given distribution is, or what attributes make the dataset difficult for a given model. To address these questions, we frame dataset difficulty -- w.r.t. a model V\mathcal{V} -- as the lack of V\mathcal{V}-usable information\textit{usable information} (Xu et al., 2019), where a lower value indicates a more difficult dataset for V\mathcal{V}. We further introduce \textit{pointwise \mathcal{V}-information} (PVI) for measuring the difficulty of individual instances w.r.t. a given distribution. While standard evaluation metrics typically only compare different models for the same dataset, V\mathcal{V}-usable information\textit{usable information} and PVI also permit the converse: for a given model V\mathcal{V}, we can compare different datasets, as well as different instances/slices of the same dataset. Furthermore, our framework allows for the interpretability of different input attributes via transformations of the input, which we use to discover annotation artefacts in widely-used NLP benchmarks.

View on arXiv
@article{ethayarajh2025_2110.08420,
  title={ Understanding Dataset Difficulty with $\mathcal{V}$-Usable Information },
  author={ Kawin Ethayarajh and Yejin Choi and Swabha Swayamdipta },
  journal={arXiv preprint arXiv:2110.08420},
  year={ 2025 }
}
Comments on this paper