ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2306.13063
  4. Cited By
Can LLMs Express Their Uncertainty? An Empirical Evaluation of
  Confidence Elicitation in LLMs

Can LLMs Express Their Uncertainty? An Empirical Evaluation of Confidence Elicitation in LLMs

22 June 2023
Miao Xiong
Zhiyuan Hu
Xinyang Lu
Yifei Li
Jie Fu
Junxian He
Bryan Hooi
ArXivPDFHTML

Papers citing "Can LLMs Express Their Uncertainty? An Empirical Evaluation of Confidence Elicitation in LLMs"

50 / 64 papers shown
Title
Uncertainty Profiles for LLMs: Uncertainty Source Decomposition and Adaptive Model-Metric Selection
Uncertainty Profiles for LLMs: Uncertainty Source Decomposition and Adaptive Model-Metric Selection
Pei-Fu Guo
Yun-Da Tsai
Shou-De Lin
UD
36
0
0
12 May 2025
Restoring Calibration for Aligned Large Language Models: A Calibration-Aware Fine-Tuning Approach
Restoring Calibration for Aligned Large Language Models: A Calibration-Aware Fine-Tuning Approach
Jiancong Xiao
Bojian Hou
Zhanliang Wang
Ruochen Jin
Q. Long
Weijie Su
Li Shen
30
0
0
04 May 2025
Uncertainty Quantification for Machine Learning in Healthcare: A Survey
Uncertainty Quantification for Machine Learning in Healthcare: A Survey
L. J. L. Lopez
Shaza Elsharief
Dhiyaa Al Jorf
Firas Darwish
Congbo Ma
Farah E. Shamout
55
0
0
04 May 2025
Always Tell Me The Odds: Fine-grained Conditional Probability Estimation
Always Tell Me The Odds: Fine-grained Conditional Probability Estimation
Liaoyaqi Wang
Zhengping Jiang
Anqi Liu
Benjamin Van Durme
57
0
0
02 May 2025
Calibrating Uncertainty Quantification of Multi-Modal LLMs using Grounding
Calibrating Uncertainty Quantification of Multi-Modal LLMs using Grounding
Trilok Padhi
R. Kaur
Adam D. Cobb
Manoj Acharya
Anirban Roy
Colin Samplawski
Brian Matejek
Alexander M. Berenbeim
Nathaniel D. Bastian
Susmit Jha
22
0
0
30 Apr 2025
Entropy Heat-Mapping: Localizing GPT-Based OCR Errors with Sliding-Window Shannon Analysis
Entropy Heat-Mapping: Localizing GPT-Based OCR Errors with Sliding-Window Shannon Analysis
Alexei Kaltchenko
45
0
0
30 Apr 2025
Towards Automated Scoping of AI for Social Good Projects
Towards Automated Scoping of AI for Social Good Projects
Jacob Emmerson
Rayid Ghani
Zheyuan Ryan Shi
85
0
0
28 Apr 2025
Uncertainty Quantification for Language Models: A Suite of Black-Box, White-Box, LLM Judge, and Ensemble Scorers
Uncertainty Quantification for Language Models: A Suite of Black-Box, White-Box, LLM Judge, and Ensemble Scorers
Dylan Bouchard
Mohit Singh Chauhan
HILM
70
0
0
27 Apr 2025
Towards Robust Dialogue Breakdown Detection: Addressing Disruptors in Large Language Models with Self-Guided Reasoning
Towards Robust Dialogue Breakdown Detection: Addressing Disruptors in Large Language Models with Self-Guided Reasoning
Abdellah Ghassel
Xianzhi Li
Xiaodan Zhu
43
0
0
26 Apr 2025
Comparing Uncertainty Measurement and Mitigation Methods for Large Language Models: A Systematic Review
Comparing Uncertainty Measurement and Mitigation Methods for Large Language Models: A Systematic Review
Toghrul Abbasli
Kentaroh Toyoda
Yuan Wang
Leon Witt
Muhammad Asif Ali
Yukai Miao
Dan Li
Qingsong Wei
UQCV
85
0
0
25 Apr 2025
Guiding VLM Agents with Process Rewards at Inference Time for GUI Navigation
Guiding VLM Agents with Process Rewards at Inference Time for GUI Navigation
Zhiyuan Hu
Shiyun Xiong
Yifan Zhang
See-Kiong Ng
Anh Tuan Luu
Bo An
Shuicheng Yan
Bryan Hooi
33
0
0
22 Apr 2025
Beyond Misinformation: A Conceptual Framework for Studying AI Hallucinations in (Science) Communication
Beyond Misinformation: A Conceptual Framework for Studying AI Hallucinations in (Science) Communication
Anqi Shao
61
0
0
18 Apr 2025
Gauging Overprecision in LLMs: An Empirical Study
Gauging Overprecision in LLMs: An Empirical Study
Adil Bahaj
Hamed Rahimi
Mohamed Chetouani
Mounir Ghogho
61
0
0
16 Apr 2025
Uncertainty Distillation: Teaching Language Models to Express Semantic Confidence
Uncertainty Distillation: Teaching Language Models to Express Semantic Confidence
Sophia Hager
David Mueller
Kevin Duh
Nicholas Andrews
65
0
0
18 Mar 2025
Calibrating Verbal Uncertainty as a Linear Feature to Reduce Hallucinations
Calibrating Verbal Uncertainty as a Linear Feature to Reduce Hallucinations
Ziwei Ji
L. Yu
Yeskendir Koishekenov
Yejin Bang
Anthony Hartshorn
Alan Schelten
Cheng Zhang
Pascale Fung
Nicola Cancedda
46
1
0
18 Mar 2025
The MASK Benchmark: Disentangling Honesty From Accuracy in AI Systems
The MASK Benchmark: Disentangling Honesty From Accuracy in AI Systems
Richard Ren
Arunim Agarwal
Mantas Mazeika
Cristina Menghini
Robert Vacareanu
...
Matias Geralnik
Adam Khoja
Dean Lee
Summer Yue
Dan Hendrycks
HILM
ALM
88
0
0
05 Mar 2025
Large Language Model Confidence Estimation via Black-Box Access
Large Language Model Confidence Estimation via Black-Box Access
Tejaswini Pedapati
Amit Dhurandhar
Soumya Ghosh
Soham Dan
P. Sattigeri
89
3
0
21 Feb 2025
Hallucination Detection in Large Language Models with Metamorphic Relations
Hallucination Detection in Large Language Models with Metamorphic Relations
Borui Yang
Md Afif Al Mamun
Jie M. Zhang
Gias Uddin
HILM
59
0
0
20 Feb 2025
BLoB: Bayesian Low-Rank Adaptation by Backpropagation for Large Language Models
BLoB: Bayesian Low-Rank Adaptation by Backpropagation for Large Language Models
Yibin Wang
H. Shi
Ligong Han
Dimitris N. Metaxas
Hao Wang
BDL
UQLM
102
6
0
28 Jan 2025
Are LLM-Judges Robust to Expressions of Uncertainty? Investigating the effect of Epistemic Markers on LLM-based Evaluation
Are LLM-Judges Robust to Expressions of Uncertainty? Investigating the effect of Epistemic Markers on LLM-based Evaluation
Dongryeol Lee
Yerin Hwang
Yongil Kim
Joonsuk Park
Kyomin Jung
ELM
70
5
0
28 Oct 2024
DAWN-ICL: Strategic Planning of Problem-solving Trajectories for Zero-Shot In-Context Learning
DAWN-ICL: Strategic Planning of Problem-solving Trajectories for Zero-Shot In-Context Learning
Xinyu Tang
Xiaolei Wang
Wayne Xin Zhao
Ji-Rong Wen
43
3
0
26 Oct 2024
Do LLMs estimate uncertainty well in instruction-following?
Do LLMs estimate uncertainty well in instruction-following?
Juyeon Heo
Miao Xiong
Christina Heinze-Deml
Jaya Narain
ELM
48
3
0
18 Oct 2024
Latent Space Chain-of-Embedding Enables Output-free LLM Self-Evaluation
Latent Space Chain-of-Embedding Enables Output-free LLM Self-Evaluation
Yiming Wang
Pei Zhang
Baosong Yang
Derek F. Wong
Rui-cang Wang
LRM
40
4
0
17 Oct 2024
On Calibration of LLM-based Guard Models for Reliable Content Moderation
On Calibration of LLM-based Guard Models for Reliable Content Moderation
Hongfu Liu
Hengguan Huang
Hao Wang
Xiangming Gu
Ye Wang
53
2
0
14 Oct 2024
TPO: Aligning Large Language Models with Multi-branch & Multi-step Preference Trees
TPO: Aligning Large Language Models with Multi-branch & Multi-step Preference Trees
Weibin Liao
Xu Chu
Yasha Wang
LRM
40
6
0
10 Oct 2024
Output Scouting: Auditing Large Language Models for Catastrophic Responses
Output Scouting: Auditing Large Language Models for Catastrophic Responses
Andrew Bell
João Fonseca
KELM
46
1
0
04 Oct 2024
Integrative Decoding: Improve Factuality via Implicit Self-consistency
Integrative Decoding: Improve Factuality via Implicit Self-consistency
Yi Cheng
Xiao Liang
Yeyun Gong
Wen Xiao
Song Wang
...
Wenjie Li
Jian Jiao
Qi Chen
Peng Cheng
Wayne Xiong
HILM
54
1
0
02 Oct 2024
Fast and Accurate Task Planning using Neuro-Symbolic Language Models and Multi-level Goal Decomposition
Fast and Accurate Task Planning using Neuro-Symbolic Language Models and Multi-level Goal Decomposition
Minseo Kwon
Yaesol Kim
Young J. Kim
19
3
0
28 Sep 2024
To CoT or not to CoT? Chain-of-thought helps mainly on math and symbolic reasoning
To CoT or not to CoT? Chain-of-thought helps mainly on math and symbolic reasoning
Zayne Sprague
Fangcong Yin
Juan Diego Rodriguez
Dongwei Jiang
Manya Wadhwa
Prasann Singhal
Xinyu Zhao
Xi Ye
Kyle Mahowald
Greg Durrett
ReLM
LRM
114
79
0
18 Sep 2024
Enhancing Healthcare LLM Trust with Atypical Presentations Recalibration
Enhancing Healthcare LLM Trust with Atypical Presentations Recalibration
Jeremy Qin
Bang Liu
Quoc Dinh Nguyen
30
2
0
05 Sep 2024
MAQA: Evaluating Uncertainty Quantification in LLMs Regarding Data Uncertainty
MAQA: Evaluating Uncertainty Quantification in LLMs Regarding Data Uncertainty
Yongjin Yang
Haneul Yoo
Hwaran Lee
60
1
0
13 Aug 2024
Harnessing Uncertainty-aware Bounding Boxes for Unsupervised 3D Object
  Detection
Harnessing Uncertainty-aware Bounding Boxes for Unsupervised 3D Object Detection
A. Benfenati
P. Causin
Hang Yu
Zhedong Zheng
3DPC
39
2
0
01 Aug 2024
PaCoST: Paired Confidence Significance Testing for Benchmark Contamination Detection in Large Language Models
PaCoST: Paired Confidence Significance Testing for Benchmark Contamination Detection in Large Language Models
Huixuan Zhang
Yun Lin
Xiaojun Wan
40
0
0
26 Jun 2024
Counterfactual Debating with Preset Stances for Hallucination Elimination of LLMs
Counterfactual Debating with Preset Stances for Hallucination Elimination of LLMs
Yi Fang
Moxin Li
Wenjie Wang
Hui Lin
Fuli Feng
LRM
54
5
0
17 Jun 2024
Cycles of Thought: Measuring LLM Confidence through Stable Explanations
Cycles of Thought: Measuring LLM Confidence through Stable Explanations
Evan Becker
Stefano Soatto
35
6
0
05 Jun 2024
Evaluating Uncertainty-based Failure Detection for Closed-Loop LLM Planners
Evaluating Uncertainty-based Failure Detection for Closed-Loop LLM Planners
Zhi Zheng
Qian Feng
Hang Li
Alois C. Knoll
Jianxiang Feng
44
6
0
01 Jun 2024
A Survey on RAG Meeting LLMs: Towards Retrieval-Augmented Large Language
  Models
A Survey on RAG Meeting LLMs: Towards Retrieval-Augmented Large Language Models
Wenqi Fan
Yujuan Ding
Liang-bo Ning
Shijie Wang
Hengyun Li
Dawei Yin
Tat-Seng Chua
Qing Li
RALM
3DV
38
181
0
10 May 2024
BIRD: A Trustworthy Bayesian Inference Framework for Large Language Models
BIRD: A Trustworthy Bayesian Inference Framework for Large Language Models
Yu Feng
Ben Zhou
Weidong Lin
Dan Roth
64
4
0
18 Apr 2024
Confidence Calibration and Rationalization for LLMs via Multi-Agent
  Deliberation
Confidence Calibration and Rationalization for LLMs via Multi-Agent Deliberation
Ruixin Yang
Dheeraj Rajagopal
S. Hayati
Bin Hu
Dongyeop Kang
LLMAG
35
3
0
14 Apr 2024
Online Safety Analysis for LLMs: a Benchmark, an Assessment, and a Path
  Forward
Online Safety Analysis for LLMs: a Benchmark, an Assessment, and a Path Forward
Xuan Xie
Jiayang Song
Zhehua Zhou
Yuheng Huang
Da Song
Lei Ma
OffRL
35
6
0
12 Apr 2024
Hallucination Detection in Foundation Models for Decision-Making: A Flexible Definition and Review of the State of the Art
Hallucination Detection in Foundation Models for Decision-Making: A Flexible Definition and Review of the State of the Art
Neeloy Chakraborty
Melkior Ornik
Katherine Driggs-Campbell
LRM
57
9
0
25 Mar 2024
On the Challenges and Opportunities in Generative AI
On the Challenges and Opportunities in Generative AI
Laura Manduchi
Kushagra Pandey
Robert Bamler
Ryan Cotterell
Sina Daubener
...
F. Wenzel
Frank Wood
Stephan Mandt
Vincent Fortuin
Vincent Fortuin
56
17
0
28 Feb 2024
Calibrating Large Language Models with Sample Consistency
Calibrating Large Language Models with Sample Consistency
Qing Lyu
Kumar Shridhar
Chaitanya Malaviya
Li Zhang
Yanai Elazar
Niket Tandon
Marianna Apidianaki
Mrinmaya Sachan
Chris Callison-Burch
41
22
0
21 Feb 2024
Soft Self-Consistency Improves Language Model Agents
Soft Self-Consistency Improves Language Model Agents
Han Wang
Archiki Prasad
Elias Stengel-Eskin
Mohit Bansal
LLMAG
24
7
0
20 Feb 2024
Enabling Weak LLMs to Judge Response Reliability via Meta Ranking
Enabling Weak LLMs to Judge Response Reliability via Meta Ranking
Zijun Liu
Boqun Kou
Peng Li
Ming Yan
Ji Zhang
Fei Huang
Yang Janet Liu
24
2
0
19 Feb 2024
Overconfident and Unconfident AI Hinder Human-AI Collaboration
Overconfident and Unconfident AI Hinder Human-AI Collaboration
Jingshu Li
Yitian Yang
Renwen Zhang
Yi-Chieh Lee
27
1
0
12 Feb 2024
The Tyranny of Possibilities in the Design of Task-Oriented LLM Systems:
  A Scoping Survey
The Tyranny of Possibilities in the Design of Task-Oriented LLM Systems: A Scoping Survey
Dhruv Dhamani
Mary Lou Maher
27
1
0
29 Dec 2023
Reducing LLM Hallucinations using Epistemic Neural Networks
Reducing LLM Hallucinations using Epistemic Neural Networks
Shreyas Verma
Kien Tran
Yusuf Ali
Guangyu Min
32
7
0
25 Dec 2023
Methods to Estimate Large Language Model Confidence
Methods to Estimate Large Language Model Confidence
Maia Kotelanski
Robert Gallo
Ashwin Nayak
Thomas Savage
LM&MA
24
6
0
28 Nov 2023
A Survey on Multimodal Large Language Models for Autonomous Driving
A Survey on Multimodal Large Language Models for Autonomous Driving
Can Cui
Yunsheng Ma
Xu Cao
Wenqian Ye
Yang Zhou
...
Xinrui Yan
Shuqi Mei
Jianguo Cao
Ziran Wang
Chao Zheng
36
249
0
21 Nov 2023
12
Next