ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2305.14975
  4. Cited By
Just Ask for Calibration: Strategies for Eliciting Calibrated Confidence
  Scores from Language Models Fine-Tuned with Human Feedback

Just Ask for Calibration: Strategies for Eliciting Calibrated Confidence Scores from Language Models Fine-Tuned with Human Feedback

24 May 2023
Katherine Tian
E. Mitchell
Allan Zhou
Archit Sharma
Rafael Rafailov
Huaxiu Yao
Chelsea Finn
Christopher D. Manning
ArXivPDFHTML

Papers citing "Just Ask for Calibration: Strategies for Eliciting Calibrated Confidence Scores from Language Models Fine-Tuned with Human Feedback"

50 / 228 papers shown
Title
JuStRank: Benchmarking LLM Judges for System Ranking
JuStRank: Benchmarking LLM Judges for System Ranking
Ariel Gera
Odellia Boni
Yotam Perlitz
Roy Bar-Haim
Lilach Eden
Asaf Yehudai
ALM
ELM
92
3
0
12 Dec 2024
SMARTCAL: An Approach to Self-Aware Tool-Use Evaluation and Calibration
SMARTCAL: An Approach to Self-Aware Tool-Use Evaluation and Calibration
Yuanhao Shen
Xiaodan Zhu
L. Chen
82
3
0
11 Dec 2024
Training-Free Bayesianization for Low-Rank Adapters of Large Language
  Models
Training-Free Bayesianization for Low-Rank Adapters of Large Language Models
H. Shi
Yibin Wang
Ligong Han
H. M. Zhang
Hao Wang
UQCV
83
0
0
07 Dec 2024
Enhancing Trust in Large Language Models with Uncertainty-Aware
  Fine-Tuning
Enhancing Trust in Large Language Models with Uncertainty-Aware Fine-Tuning
R. Krishnan
Piyush Khanna
Omesh Tickoo
HILM
69
1
0
03 Dec 2024
Is my Meeting Summary Good? Estimating Quality with a Multi-LLM
  Evaluator
Is my Meeting Summary Good? Estimating Quality with a Multi-LLM Evaluator
Frederic Kirstein
Terry Ruas
Bela Gipp
85
2
0
27 Nov 2024
Text-to-SQL Calibration: No Need to Ask -- Just Rescale Model
  Probabilities
Text-to-SQL Calibration: No Need to Ask -- Just Rescale Model Probabilities
Ashwin Ramachandran
Sunita Sarawagi
66
2
0
23 Nov 2024
Unlocking Historical Clinical Trial Data with ALIGN: A Compositional Large Language Model System for Medical Coding
Unlocking Historical Clinical Trial Data with ALIGN: A Compositional Large Language Model System for Medical Coding
Nabeel Seedat
Caterina Tozzi
Andrea Hita Ardiaca
M. Schaar
James Weatherall
Adam Taylor
138
0
0
20 Nov 2024
Graph-based Confidence Calibration for Large Language Models
Graph-based Confidence Calibration for Large Language Models
Yukun Li
Sijia Wang
Lifu Huang
Li-Ping Liu
UQCV
31
1
0
03 Nov 2024
Matchmaker: Self-Improving Large Language Model Programs for Schema
  Matching
Matchmaker: Self-Improving Large Language Model Programs for Schema Matching
Nabeel Seedat
M. Schaar
34
2
0
31 Oct 2024
Dynamic Strategy Planning for Efficient Question Answering with Large Language Models
Dynamic Strategy Planning for Efficient Question Answering with Large Language Models
Tanmay Parekh
Pradyot Prakash
Alexander Radovic
Akshay Shekher
Denis Savenkov
LRM
51
1
0
30 Oct 2024
Graph-based Uncertainty Metrics for Long-form Language Model Outputs
Graph-based Uncertainty Metrics for Long-form Language Model Outputs
Mingjian Jiang
Yangjun Ruan
Prasanna Sattigeri
Salim Roukos
Tatsunori Hashimoto
20
0
0
28 Oct 2024
Are LLM-Judges Robust to Expressions of Uncertainty? Investigating the effect of Epistemic Markers on LLM-based Evaluation
Are LLM-Judges Robust to Expressions of Uncertainty? Investigating the effect of Epistemic Markers on LLM-based Evaluation
Dongryeol Lee
Yerin Hwang
Yongil Kim
Joonsuk Park
Kyomin Jung
ELM
70
5
0
28 Oct 2024
EfficientEQA: An Efficient Approach for Open Vocabulary Embodied
  Question Answering
EfficientEQA: An Efficient Approach for Open Vocabulary Embodied Question Answering
Kai Cheng
Zhengyuan Li
Xingpeng Sun
Byung-Cheol Min
Amrit Singh Bedi
Aniket Bera
35
2
0
26 Oct 2024
Rethinking the Uncertainty: A Critical Review and Analysis in the Era of
  Large Language Models
Rethinking the Uncertainty: A Critical Review and Analysis in the Era of Large Language Models
Mohammad Beigi
Sijia Wang
Ying Shen
Zihao Lin
Adithya Kulkarni
...
Ming Jin
Jin-Hee Cho
Dawei Zhou
Chang-Tien Lu
Lifu Huang
21
1
0
26 Oct 2024
Parameter-Efficient Fine-Tuning in Large Models: A Survey of Methodologies
Parameter-Efficient Fine-Tuning in Large Models: A Survey of Methodologies
L. Wang
Sheng Chen
Linnan Jiang
Shu Pan
Runze Cai
Sen Yang
Fei Yang
44
3
0
24 Oct 2024
A Survey of Uncertainty Estimation in LLMs: Theory Meets Practice
A Survey of Uncertainty Estimation in LLMs: Theory Meets Practice
Hsiu-Yuan Huang
Yutong Yang
Zhaoxi Zhang
Sanwoo Lee
Yunfang Wu
30
9
0
20 Oct 2024
LoGU: Long-form Generation with Uncertainty Expressions
LoGU: Long-form Generation with Uncertainty Expressions
Ruihan Yang
Caiqi Zhang
Zhisong Zhang
Xinting Huang
Sen Yang
Nigel Collier
Dong Yu
Deqing Yang
HILM
24
4
0
18 Oct 2024
Do LLMs estimate uncertainty well in instruction-following?
Do LLMs estimate uncertainty well in instruction-following?
Juyeon Heo
Miao Xiong
Christina Heinze-Deml
Jaya Narain
ELM
48
3
0
18 Oct 2024
Accounting for Sycophancy in Language Model Uncertainty Estimation
Accounting for Sycophancy in Language Model Uncertainty Estimation
Anthony Sicilia
Mert Inan
Malihe Alikhani
24
1
0
17 Oct 2024
Eliciting Uncertainty in Chain-of-Thought to Mitigate Bias against
  Forecasting Harmful User Behaviors
Eliciting Uncertainty in Chain-of-Thought to Mitigate Bias against Forecasting Harmful User Behaviors
Anthony Sicilia
Malihe Alikhani
18
2
0
17 Oct 2024
Learning to Route LLMs with Confidence Tokens
Learning to Route LLMs with Confidence Tokens
Yu-Neng Chuang
Helen Zhou
Prathusha Kameswara Sarma
Parikshit Gopalan
John Boccio
Sara Bolouki
Xia Hu
30
0
0
17 Oct 2024
Latent Space Chain-of-Embedding Enables Output-free LLM Self-Evaluation
Latent Space Chain-of-Embedding Enables Output-free LLM Self-Evaluation
Yiming Wang
Pei Zhang
Baosong Yang
Derek F. Wong
Rui-cang Wang
LRM
40
4
0
17 Oct 2024
LLM Confidence Evaluation Measures in Zero-Shot CSS Classification
LLM Confidence Evaluation Measures in Zero-Shot CSS Classification
David Farr
Iain Cruickshank
Nico Manzonelli
Nicholas Clark
Kate Starbird
Jevin West
30
0
0
16 Oct 2024
MlingConf: A Comprehensive Study of Multilingual Confidence Estimation on Large Language Models
Boyang Xue
Hongru Wang
Rui Wang
Sheng Wang
Zezhong Wang
Yiming Du
Bin Liang
Kam-Fai Wong
29
0
0
16 Oct 2024
Varying Shades of Wrong: Aligning LLMs with Wrong Answers Only
Varying Shades of Wrong: Aligning LLMs with Wrong Answers Only
Jihan Yao
Wenxuan Ding
Shangbin Feng
Lucy Lu Wang
Yulia Tsvetkov
30
0
0
14 Oct 2024
QUITE: Quantifying Uncertainty in Natural Language Text in Bayesian
  Reasoning Scenarios
QUITE: Quantifying Uncertainty in Natural Language Text in Bayesian Reasoning Scenarios
Timo Pierre Schrader
Lukas Lange
Simon Razniewski
Annemarie Friedrich
UQLM
25
0
0
14 Oct 2024
Taming Overconfidence in LLMs: Reward Calibration in RLHF
Taming Overconfidence in LLMs: Reward Calibration in RLHF
Jixuan Leng
Chengsong Huang
Banghua Zhu
Jiaxin Huang
26
7
0
13 Oct 2024
Calibrating Verbalized Probabilities for Large Language Models
Calibrating Verbalized Probabilities for Large Language Models
Cheng Wang
Gyuri Szarvas
Georges Balazs
Pavel Danchenko
P. Ernst
13
0
0
09 Oct 2024
Functional-level Uncertainty Quantification for Calibrated Fine-tuning on LLMs
Functional-level Uncertainty Quantification for Calibrated Fine-tuning on LLMs
Ruijia Niu
D. Wu
Rose Yu
Yi-An Ma
28
1
0
09 Oct 2024
Gradual Learning: Optimizing Fine-Tuning with Partially Mastered
  Knowledge in Large Language Models
Gradual Learning: Optimizing Fine-Tuning with Partially Mastered Knowledge in Large Language Models
Bozhou Li
Hao Liang
Yang Li
Fangcheng Fu
Hongzhi Yin
Conghui He
Wentao Zhang
KELM
CLL
48
0
0
08 Oct 2024
Calibrating Expressions of Certainty
Calibrating Expressions of Certainty
Peiqi Wang
Barbara D. Lam
Yingcheng Liu
Ameneh Asgari-Targhi
Rameswar Panda
W. Wells
Tina Kapur
Polina Golland
27
1
0
06 Oct 2024
LLMs Know More Than They Show: On the Intrinsic Representation of LLM
  Hallucinations
LLMs Know More Than They Show: On the Intrinsic Representation of LLM Hallucinations
Hadas Orgad
Michael Toker
Zorik Gekhman
Roi Reichart
Idan Szpektor
Hadas Kotek
Yonatan Belinkov
HILM
AIFin
49
25
0
03 Oct 2024
Calibrate to Discriminate: Improve In-Context Learning with Label-Free
  Comparative Inference
Calibrate to Discriminate: Improve In-Context Learning with Label-Free Comparative Inference
Wei Cheng
Tianlu Wang
Yanmin Ji
Fan Yang
Keren Tan
Yiyu Zheng
18
0
0
03 Oct 2024
FlipGuard: Defending Preference Alignment against Update Regression with
  Constrained Optimization
FlipGuard: Defending Preference Alignment against Update Regression with Constrained Optimization
Mingye Zhu
Yi Liu
Quan Wang
Junbo Guo
Zhendong Mao
24
1
0
01 Oct 2024
Calibrating Language Models with Adaptive Temperature Scaling
Calibrating Language Models with Adaptive Temperature Scaling
Johnathan Xie
Annie S. Chen
Yoonho Lee
Eric Mitchell
Chelsea Finn
13
7
0
29 Sep 2024
A Survey on the Honesty of Large Language Models
A Survey on the Honesty of Large Language Models
Siheng Li
Cheng Yang
Taiqiang Wu
Chufan Shi
Yuji Zhang
...
Jie Zhou
Yujiu Yang
Ngai Wong
Xixin Wu
Wai Lam
HILM
27
4
0
27 Sep 2024
HaloScope: Harnessing Unlabeled LLM Generations for Hallucination
  Detection
HaloScope: Harnessing Unlabeled LLM Generations for Hallucination Detection
Xuefeng Du
Chaowei Xiao
Yixuan Li
HILM
29
16
0
26 Sep 2024
Controlling Risk of Retrieval-augmented Generation: A Counterfactual
  Prompting Framework
Controlling Risk of Retrieval-augmented Generation: A Counterfactual Prompting Framework
Lu Chen
Ruqing Zhang
Jiafeng Guo
Yixing Fan
Xueqi Cheng
24
3
0
24 Sep 2024
Confidence Estimation for LLM-Based Dialogue State Tracking
Confidence Estimation for LLM-Based Dialogue State Tracking
Yi-Jyun Sun
Suvodip Dey
Dilek Z. Hakkani-Tür
Gökhan Tür
48
1
0
15 Sep 2024
What is the Role of Small Models in the LLM Era: A Survey
What is the Role of Small Models in the LLM Era: A Survey
Lihu Chen
Gaël Varoquaux
ALM
58
23
0
10 Sep 2024
Enhancing Healthcare LLM Trust with Atypical Presentations Recalibration
Enhancing Healthcare LLM Trust with Atypical Presentations Recalibration
Jeremy Qin
Bang Liu
Quoc Dinh Nguyen
30
2
0
05 Sep 2024
Does Alignment Tuning Really Break LLMs' Internal Confidence?
Does Alignment Tuning Really Break LLMs' Internal Confidence?
Hongseok Oh
Wonseok Hwang
45
0
0
31 Aug 2024
Assessing Generative Language Models in Classification Tasks:
  Performance and Self-Evaluation Capabilities in the Environmental and Climate
  Change Domain
Assessing Generative Language Models in Classification Tasks: Performance and Self-Evaluation Capabilities in the Environmental and Climate Change Domain
Francesca Grasso
Stefano Locci
24
3
0
30 Aug 2024
Can Unconfident LLM Annotations Be Used for Confident Conclusions?
Can Unconfident LLM Annotations Be Used for Confident Conclusions?
Kristina Gligorić
Tijana Zrnic
Cinoo Lee
Emmanuel J. Candès
Dan Jurafsky
66
5
0
27 Aug 2024
Crowd-Calibrator: Can Annotator Disagreement Inform Calibration in
  Subjective Tasks?
Crowd-Calibrator: Can Annotator Disagreement Inform Calibration in Subjective Tasks?
Urja Khurana
Eric T. Nalisnick
Antske Fokkens
Swabha Swayamdipta
35
3
0
26 Aug 2024
Are Large Language Models More Honest in Their Probabilistic or
  Verbalized Confidence?
Are Large Language Models More Honest in Their Probabilistic or Verbalized Confidence?
Shiyu Ni
Keping Bi
Lulu Yu
Jiafeng Guo
HILM
27
4
0
19 Aug 2024
How Susceptible are LLMs to Influence in Prompts?
How Susceptible are LLMs to Influence in Prompts?
Sotiris Anagnostidis
Jannis Bulian
LRM
38
16
0
17 Aug 2024
Defining Boundaries: A Spectrum of Task Feasibility for Large Language
  Models
Defining Boundaries: A Spectrum of Task Feasibility for Large Language Models
Wenbo Zhang
Zihang Xu
Hengrui Cai
30
1
0
11 Aug 2024
Cost-Effective Hallucination Detection for LLMs
Cost-Effective Hallucination Detection for LLMs
Simon Valentin
Jinmiao Fu
Gianluca Detommaso
Shaoyuan Xu
Giovanni Zappella
Bryan Wang
HILM
33
4
0
31 Jul 2024
Trust or Escalate: LLM Judges with Provable Guarantees for Human
  Agreement
Trust or Escalate: LLM Judges with Provable Guarantees for Human Agreement
Jaehun Jung
Faeze Brahman
Yejin Choi
ALM
42
12
0
25 Jul 2024
Previous
12345
Next