Can LLMs Express Their Uncertainty? An Empirical Evaluation of Confidence Elicitation in LLMs

22 June 2023

Zhiyuan Hu

Bryan Hooi

Papers citing "Can LLMs Express Their Uncertainty? An Empirical Evaluation of Confidence Elicitation in LLMs"

50 / 64 papers shown

Title
Uncertainty Profiles for LLMs: Uncertainty Source Decomposition and Adaptive Model-Metric Selection Pei-Fu Guo Yun-Da Tsai Shou-De Lin UD 36 0 0 12 May 2025
Restoring Calibration for Aligned Large Language Models: A Calibration-Aware Fine-Tuning Approach Jiancong Xiao Bojian Hou Zhanliang Wang Ruochen Jin Q. Long Weijie Su Li Shen 30 0 0 04 May 2025
Uncertainty Quantification for Machine Learning in Healthcare: A Survey L. J. L. Lopez Shaza Elsharief Dhiyaa Al Jorf Firas Darwish Congbo Ma Farah E. Shamout 58 0 0 04 May 2025
Always Tell Me The Odds: Fine-grained Conditional Probability Estimation Liaoyaqi Wang Zhengping Jiang Anqi Liu Benjamin Van Durme 57 0 0 02 May 2025
Calibrating Uncertainty Quantification of Multi-Modal LLMs using Grounding Trilok Padhi R. Kaur Adam D. Cobb Manoj Acharya Anirban Roy Colin Samplawski Brian Matejek Alexander M. Berenbeim Nathaniel D. Bastian Susmit Jha 22 0 0 30 Apr 2025
Entropy Heat-Mapping: Localizing GPT-Based OCR Errors with Sliding-Window Shannon Analysis Alexei Kaltchenko 45 0 0 30 Apr 2025
Towards Automated Scoping of AI for Social Good Projects Jacob Emmerson Rayid Ghani Zheyuan Ryan Shi 91 0 0 28 Apr 2025
Uncertainty Quantification for Language Models: A Suite of Black-Box, White-Box, LLM Judge, and Ensemble Scorers Dylan Bouchard Mohit Singh Chauhan HILM 70 0 0 27 Apr 2025
Towards Robust Dialogue Breakdown Detection: Addressing Disruptors in Large Language Models with Self-Guided Reasoning Abdellah Ghassel Xianzhi Li Xiaodan Zhu 43 0 0 26 Apr 2025
Comparing Uncertainty Measurement and Mitigation Methods for Large Language Models: A Systematic Review Toghrul Abbasli Kentaroh Toyoda Yuan Wang Leon Witt Muhammad Asif Ali Yukai Miao Dan Li Qingsong Wei UQCV 87 0 0 25 Apr 2025
Guiding VLM Agents with Process Rewards at Inference Time for GUI Navigation Zhiyuan Hu Shiyun Xiong Yifan Zhang See-Kiong Ng Anh Tuan Luu Bo An Shuicheng Yan Bryan Hooi 33 0 0 22 Apr 2025
Beyond Misinformation: A Conceptual Framework for Studying AI Hallucinations in (Science) Communication Anqi Shao 61 0 0 18 Apr 2025
Gauging Overprecision in LLMs: An Empirical Study Adil Bahaj Hamed Rahimi Mohamed Chetouani Mounir Ghogho 61 0 0 16 Apr 2025
Uncertainty Distillation: Teaching Language Models to Express Semantic Confidence Sophia Hager David Mueller Kevin Duh Nicholas Andrews 65 0 0 18 Mar 2025
Calibrating Verbal Uncertainty as a Linear Feature to Reduce Hallucinations Ziwei Ji L. Yu Yeskendir Koishekenov Yejin Bang Anthony Hartshorn Alan Schelten Cheng Zhang Pascale Fung Nicola Cancedda 46 1 0 18 Mar 2025
The MASK Benchmark: Disentangling Honesty From Accuracy in AI Systems Richard Ren Arunim Agarwal Mantas Mazeika Cristina Menghini Robert Vacareanu ... Matias Geralnik Adam Khoja Dean Lee Summer Yue Dan Hendrycks HILM ALM 88 0 0 05 Mar 2025
Large Language Model Confidence Estimation via Black-Box Access Tejaswini Pedapati Amit Dhurandhar Soumya Ghosh Soham Dan P. Sattigeri 89 3 0 21 Feb 2025
Hallucination Detection in Large Language Models with Metamorphic Relations Borui Yang Md Afif Al Mamun Jie M. Zhang Gias Uddin HILM 62 0 0 20 Feb 2025
BLoB: Bayesian Low-Rank Adaptation by Backpropagation for Large Language Models Yibin Wang H. Shi Ligong Han Dimitris N. Metaxas Hao Wang BDL UQLM 102 6 0 28 Jan 2025
Are LLM-Judges Robust to Expressions of Uncertainty? Investigating the effect of Epistemic Markers on LLM-based Evaluation Dongryeol Lee Yerin Hwang Yongil Kim Joonsuk Park Kyomin Jung ELM 70 5 0 28 Oct 2024
DAWN-ICL: Strategic Planning of Problem-solving Trajectories for Zero-Shot In-Context Learning Xinyu Tang Xiaolei Wang Wayne Xin Zhao Ji-Rong Wen 43 3 0 26 Oct 2024
Do LLMs estimate uncertainty well in instruction-following? Juyeon Heo Miao Xiong Christina Heinze-Deml Jaya Narain ELM 48 3 0 18 Oct 2024
Latent Space Chain-of-Embedding Enables Output-free LLM Self-Evaluation Yiming Wang Pei Zhang Baosong Yang Derek F. Wong Rui-cang Wang LRM 40 4 0 17 Oct 2024
On Calibration of LLM-based Guard Models for Reliable Content Moderation Hongfu Liu Hengguan Huang Hao Wang Xiangming Gu Ye Wang 53 2 0 14 Oct 2024
TPO: Aligning Large Language Models with Multi-branch & Multi-step Preference Trees Weibin Liao Xu Chu Yasha Wang LRM 40 6 0 10 Oct 2024
Output Scouting: Auditing Large Language Models for Catastrophic Responses Andrew Bell João Fonseca KELM 46 1 0 04 Oct 2024
Integrative Decoding: Improve Factuality via Implicit Self-consistency Yi Cheng Xiao Liang Yeyun Gong Wen Xiao Song Wang ... Wenjie Li Jian Jiao Qi Chen Peng Cheng Wayne Xiong HILM 54 1 0 02 Oct 2024
Fast and Accurate Task Planning using Neuro-Symbolic Language Models and Multi-level Goal Decomposition Minseo Kwon Yaesol Kim Young J. Kim 19 3 0 28 Sep 2024
To CoT or not to CoT? Chain-of-thought helps mainly on math and symbolic reasoning Zayne Sprague Fangcong Yin Juan Diego Rodriguez Dongwei Jiang Manya Wadhwa Prasann Singhal Xinyu Zhao Xi Ye Kyle Mahowald Greg Durrett ReLM LRM 114 79 0 18 Sep 2024
Enhancing Healthcare LLM Trust with Atypical Presentations Recalibration Jeremy Qin Bang Liu Quoc Dinh Nguyen 30 2 0 05 Sep 2024
MAQA: Evaluating Uncertainty Quantification in LLMs Regarding Data Uncertainty Yongjin Yang Haneul Yoo Hwaran Lee 60 1 0 13 Aug 2024
Harnessing Uncertainty-aware Bounding Boxes for Unsupervised 3D Object Detection A. Benfenati P. Causin Hang Yu Zhedong Zheng 3DPC 39 2 0 01 Aug 2024
PaCoST: Paired Confidence Significance Testing for Benchmark Contamination Detection in Large Language Models Huixuan Zhang Yun Lin Xiaojun Wan 40 0 0 26 Jun 2024
Counterfactual Debating with Preset Stances for Hallucination Elimination of LLMs Yi Fang Moxin Li Wenjie Wang Hui Lin Fuli Feng LRM 54 5 0 17 Jun 2024
Cycles of Thought: Measuring LLM Confidence through Stable Explanations Evan Becker Stefano Soatto 35 6 0 05 Jun 2024
Evaluating Uncertainty-based Failure Detection for Closed-Loop LLM Planners Zhi Zheng Qian Feng Hang Li Alois C. Knoll Jianxiang Feng 44 6 0 01 Jun 2024
A Survey on RAG Meeting LLMs: Towards Retrieval-Augmented Large Language Models Wenqi Fan Yujuan Ding Liang-bo Ning Shijie Wang Hengyun Li Dawei Yin Tat-Seng Chua Qing Li RALM 3DV 38 181 0 10 May 2024
BIRD: A Trustworthy Bayesian Inference Framework for Large Language Models Yu Feng Ben Zhou Weidong Lin Dan Roth 64 4 0 18 Apr 2024
Confidence Calibration and Rationalization for LLMs via Multi-Agent Deliberation Ruixin Yang Dheeraj Rajagopal S. Hayati Bin Hu Dongyeop Kang LLMAG 35 3 0 14 Apr 2024
Online Safety Analysis for LLMs: a Benchmark, an Assessment, and a Path Forward Xuan Xie Jiayang Song Zhehua Zhou Yuheng Huang Da Song Lei Ma OffRL 35 6 0 12 Apr 2024
Hallucination Detection in Foundation Models for Decision-Making: A Flexible Definition and Review of the State of the Art Neeloy Chakraborty Melkior Ornik Katherine Driggs-Campbell LRM 57 9 0 25 Mar 2024
On the Challenges and Opportunities in Generative AI Laura Manduchi Kushagra Pandey Robert Bamler Ryan Cotterell Sina Daubener ... F. Wenzel Frank Wood Stephan Mandt Vincent Fortuin Vincent Fortuin 56 17 0 28 Feb 2024
Calibrating Large Language Models with Sample Consistency Qing Lyu Kumar Shridhar Chaitanya Malaviya Li Zhang Yanai Elazar Niket Tandon Marianna Apidianaki Mrinmaya Sachan Chris Callison-Burch 41 22 0 21 Feb 2024
Soft Self-Consistency Improves Language Model Agents Han Wang Archiki Prasad Elias Stengel-Eskin Mohit Bansal LLMAG 24 7 0 20 Feb 2024
Enabling Weak LLMs to Judge Response Reliability via Meta Ranking Zijun Liu Boqun Kou Peng Li Ming Yan Ji Zhang Fei Huang Yang Janet Liu 24 2 0 19 Feb 2024
Overconfident and Unconfident AI Hinder Human-AI Collaboration Jingshu Li Yitian Yang Renwen Zhang Yi-Chieh Lee 27 1 0 12 Feb 2024
The Tyranny of Possibilities in the Design of Task-Oriented LLM Systems: A Scoping Survey Dhruv Dhamani Mary Lou Maher 27 1 0 29 Dec 2023
Reducing LLM Hallucinations using Epistemic Neural Networks Shreyas Verma Kien Tran Yusuf Ali Guangyu Min 35 7 0 25 Dec 2023
Methods to Estimate Large Language Model Confidence Maia Kotelanski Robert Gallo Ashwin Nayak Thomas Savage LM&MA 24 6 0 28 Nov 2023
A Survey on Multimodal Large Language Models for Autonomous Driving Can Cui Yunsheng Ma Xu Cao Wenqian Ye Yang Zhou ... Xinrui Yan Shuqi Mei Jianguo Cao Ziran Wang Chao Zheng 36 249 0 21 Nov 2023