Full-ECE: A Metric For Token-level Calibration on Large Language Models

17 June 2024

Han Liu

Yupeng Zhang

Bingning Wang

Weipeng Chen

Xiaolin Hu

ArXiv (abs)PDF HTML

Abstract

Deep Neural Networks (DNNs) excel in various domains but face challenges in providing accurate uncertainty estimates, which are crucial for high-stakes applications. Large Language Models (LLMs) have recently emerged as powerful tools, demonstrating exceptional performance in language tasks. However, traditional calibration metrics such as Expected Calibration Error (ECE) and classwise-ECE (cw-ECE) are inadequate for LLMs due to their vast vocabularies, data complexity, and distributional focus. To address this, we propose a novel calibration concept called full calibration and introduce its corresponding metric, Full-ECE. Full-ECE evaluates the entire predicted probability distribution, offering a more accurate and robust measure of calibration for LLMs.

View on arXiv

Comments on this paper