Conformal Prediction (CP) allows to perform rigorous uncertainty quantification by constructing a prediction set satisfying for a user-chosen by relying on calibration data from . It is typically implicitly assumed that is the "true" posterior label distribution. However, in many real-world scenarios, the labels are obtained by aggregating expert opinions using a voting procedure, resulting in a one-hot distribution . For such ``voted'' labels, CP guarantees are thus w.r.t. rather than the true distribution . In cases with unambiguous ground truth labels, the distinction between and is irrelevant. However, when experts do not agree because of ambiguous labels, approximating with a one-hot distribution ignores this uncertainty. In this paper, we propose to leverage expert opinions to approximate using a non-degenerate distribution . We develop Monte Carlo CP procedures which provide guarantees w.r.t. by sampling multiple synthetic pseudo-labels from for each calibration example . In a case study of skin condition classification with significant disagreement among expert annotators, we show that applying CP w.r.t. under-covers expert annotations: calibrated for coverage, it falls short by on average ; our Monte Carlo CP closes this gap both empirically and theoretically.
View on arXiv