ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2404.03163
22
16

Uncertainty in Language Models: Assessment through Rank-Calibration

4 April 2024
Xinmeng Huang
Shuo Li
Mengxin Yu
Matteo Sesia
Hamed Hassani
Insup Lee
Osbert Bastani
Edgar Dobriban
ArXivPDFHTML
Abstract

Language Models (LMs) have shown promising performance in natural language generation. However, as LMs often generate incorrect or hallucinated responses, it is crucial to correctly quantify their uncertainty in responding to given inputs. In addition to verbalized confidence elicited via prompting, many uncertainty measures (e.g.e.g.e.g., semantic entropy and affinity-graph-based measures) have been proposed. However, these measures can differ greatly, and it is unclear how to compare them, partly because they take values over different ranges (e.g.e.g.e.g., [0,∞)[0,\infty)[0,∞) or [0,1][0,1][0,1]). In this work, we address this issue by developing a novel and practical framework, termed RankRankRank-CalibrationCalibrationCalibration, to assess uncertainty and confidence measures for LMs. Our key tenet is that higher uncertainty (or lower confidence) should imply lower generation quality, on average. Rank-calibration quantifies deviations from this ideal relationship in a principled manner, without requiring ad hoc binary thresholding of the correctness score (e.g.e.g.e.g., ROUGE or METEOR). The broad applicability and the granular interpretability of our methods are demonstrated empirically.

View on arXiv
Comments on this paper