ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2307.09302
8
15

Conformal prediction under ambiguous ground truth

18 July 2023
David Stutz
Abhijit Guha Roy
Tatiana Matejovicova
Patricia Strachan
A. Cemgil
Arnaud Doucet
ArXivPDFHTML
Abstract

Conformal Prediction (CP) allows to perform rigorous uncertainty quantification by constructing a prediction set C(X)C(X)C(X) satisfying P(Y∈C(X))≥1−α\mathbb{P}(Y \in C(X))\geq 1-\alphaP(Y∈C(X))≥1−α for a user-chosen α∈[0,1]\alpha \in [0,1]α∈[0,1] by relying on calibration data (X1,Y1),...,(Xn,Yn)(X_1,Y_1),...,(X_n,Y_n)(X1​,Y1​),...,(Xn​,Yn​) from P=PX⊗PY∣X\mathbb{P}=\mathbb{P}^{X} \otimes \mathbb{P}^{Y|X}P=PX⊗PY∣X. It is typically implicitly assumed that PY∣X\mathbb{P}^{Y|X}PY∣X is the "true" posterior label distribution. However, in many real-world scenarios, the labels Y1,...,YnY_1,...,Y_nY1​,...,Yn​ are obtained by aggregating expert opinions using a voting procedure, resulting in a one-hot distribution PvoteY∣X\mathbb{P}_{vote}^{Y|X}PvoteY∣X​. For such ``voted'' labels, CP guarantees are thus w.r.t. Pvote=PX⊗PvoteY∣X\mathbb{P}_{vote}=\mathbb{P}^X \otimes \mathbb{P}_{vote}^{Y|X}Pvote​=PX⊗PvoteY∣X​ rather than the true distribution P\mathbb{P}P. In cases with unambiguous ground truth labels, the distinction between Pvote\mathbb{P}_{vote}Pvote​ and P\mathbb{P}P is irrelevant. However, when experts do not agree because of ambiguous labels, approximating PY∣X\mathbb{P}^{Y|X}PY∣X with a one-hot distribution PvoteY∣X\mathbb{P}_{vote}^{Y|X}PvoteY∣X​ ignores this uncertainty. In this paper, we propose to leverage expert opinions to approximate PY∣X\mathbb{P}^{Y|X}PY∣X using a non-degenerate distribution PaggY∣X\mathbb{P}_{agg}^{Y|X}PaggY∣X​. We develop Monte Carlo CP procedures which provide guarantees w.r.t. Pagg=PX⊗PaggY∣X\mathbb{P}_{agg}=\mathbb{P}^X \otimes \mathbb{P}_{agg}^{Y|X}Pagg​=PX⊗PaggY∣X​ by sampling multiple synthetic pseudo-labels from PaggY∣X\mathbb{P}_{agg}^{Y|X}PaggY∣X​ for each calibration example X1,...,XnX_1,...,X_nX1​,...,Xn​. In a case study of skin condition classification with significant disagreement among expert annotators, we show that applying CP w.r.t. Pvote\mathbb{P}_{vote}Pvote​ under-covers expert annotations: calibrated for 72%72\%72% coverage, it falls short by on average 10%10\%10%; our Monte Carlo CP closes this gap both empirically and theoretically.

View on arXiv
Comments on this paper