ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2502.05275
43
0

Interpretable Failure Detection with Human-Level Concepts

7 February 2025
Kien X. Nguyen
Tang Li
Xi Peng
ArXivPDFHTML
Abstract

Reliable failure detection holds paramount importance in safety-critical applications. Yet, neural networks are known to produce overconfident predictions for misclassified samples. As a result, it remains a problematic matter as existing confidence score functions rely on category-level signals, the logits, to detect failures. This research introduces an innovative strategy, leveraging human-level concepts for a dual purpose: to reliably detect when a model fails and to transparently interpret why. By integrating a nuanced array of signals for each category, our method enables a finer-grained assessment of the model's confidence. We present a simple yet highly effective approach based on the ordinal ranking of concept activation to the input image. Without bells and whistles, our method significantly reduce the false positive rate across diverse real-world image classification benchmarks, specifically by 3.7% on ImageNet and 9% on EuroSAT.

View on arXiv
@article{nguyen2025_2502.05275,
  title={ Interpretable Failure Detection with Human-Level Concepts },
  author={ Kien X. Nguyen and Tang Li and Xi Peng },
  journal={arXiv preprint arXiv:2502.05275},
  year={ 2025 }
}
Comments on this paper