A Close Look into the Calibration of Pre-trained Language Models

31 October 2022

Yangyi Chen

Lifan Yuan

Ganqu Cui

Zhiyuan Liu

Heng Ji

ArXiv PDF HTML

Papers citing "A Close Look into the Calibration of Pre-trained Language Models"

26 / 26 papers shown

Title
Restoring Calibration for Aligned Large Language Models: A Calibration-Aware Fine-Tuning Approach Jiancong Xiao Bojian Hou Zhanliang Wang Ruochen Jin Q. Long Weijie Su Li Shen 35 0 0 04 May 2025
Always Tell Me The Odds: Fine-grained Conditional Probability Estimation Liaoyaqi Wang Zhengping Jiang Anqi Liu Benjamin Van Durme 61 0 0 02 May 2025
MAC-Tuning: LLM Multi-Compositional Problem Reasoning with Enhanced Knowledge Boundary Awareness Junsheng Huang Zhitao He Sandeep Polisetty Q. Wang May Fung KELM 45 0 0 30 Apr 2025
Comparing Uncertainty Measurement and Mitigation Methods for Large Language Models: A Systematic Review Toghrul Abbasli Kentaroh Toyoda Yuan Wang Leon Witt Muhammad Asif Ali Yukai Miao Dan Li Qingsong Wei UQCV 92 0 0 25 Apr 2025
Large Language Model Confidence Estimation via Black-Box Access Tejaswini Pedapati Amit Dhurandhar Soumya Ghosh Soham Dan P. Sattigeri 89 3 0 21 Feb 2025
Understanding the Capabilities and Limitations of Weak-to-Strong Generalization Wei Yao Wenkai Yang Zhilin Wang Yankai Lin Yong Liu ELM 107 1 0 03 Feb 2025
Confidence Calibration of Classifiers with Many Classes Adrien LeCoz Stéphane Herbin Faouzi Adjed UQCV 37 1 0 05 Nov 2024
On Calibration of LLM-based Guard Models for Reliable Content Moderation Hongfu Liu Hengguan Huang Hao Wang Xiangming Gu Ye Wang 55 2 0 14 Oct 2024
Learning to Maximize Mutual Information for Chain-of-Thought Distillation Xin Chen Hanxian Huang Yanjun Gao Yi Wang Jishen Zhao Ke Ding 35 11 0 05 Mar 2024
Calibrating Large Language Models with Sample Consistency Qing Lyu Kumar Shridhar Chaitanya Malaviya Li Zhang Yanai Elazar Niket Tandon Marianna Apidianaki Mrinmaya Sachan Chris Callison-Burch 43 23 0 21 Feb 2024
Learning Shortcuts: On the Misleading Promise of NLU in Language Models Geetanjali Bihani Julia Taylor Rayz 33 3 0 17 Jan 2024
Measuring and Improving Chain-of-Thought Reasoning in Vision-Language Models Yangyi Chen Karan Sikka Michael Cogswell Heng Ji Ajay Divakaran LRM 36 24 0 08 Sep 2023
Cognitive Architectures for Language Agents T. Sumers Shunyu Yao Karthik Narasimhan Thomas L. Griffiths LLMAG LM&Ro 54 153 0 05 Sep 2023
Revisiting Out-of-distribution Robustness in NLP: Benchmark, Analysis, and LLMs Evaluations Lifan Yuan Yangyi Chen Ganqu Cui Hongcheng Gao Fangyuan Zou Xingyi Cheng Heng Ji Zhiyuan Liu Maosong Sun 39 73 0 07 Jun 2023
Taking Advice from ChatGPT Peter Zhang 37 5 0 11 May 2023
Exploring Predictive Uncertainty and Calibration in NLP: A Study on the Impact of Method & Data Scarcity Dennis Ulmer J. Frellsen Christian Hardmeier 189 22 0 20 Oct 2022
Uncertainty Quantification with Pre-trained Language Models: A Large-Scale Empirical Analysis Yuxin Xiao Paul Pu Liang Umang Bhatt W. Neiswanger Ruslan Salakhutdinov Louis-Philippe Morency 175 86 0 10 Oct 2022
Identifying and Mitigating Spurious Correlations for Improving Robustness in NLP Models Tianlu Wang Rohit Sridhar Diyi Yang Xuezhi Wang AAML 120 72 0 14 Oct 2021
Latent Hatred: A Benchmark for Understanding Implicit Hate Speech Mai Elsherief Caleb Ziems D. Muchlinski Vaishnavi Anupindi Jordyn Seybolt M. D. Choudhury Diyi Yang 103 236 0 11 Sep 2021
The Power of Scale for Parameter-Efficient Prompt Tuning Brian Lester Rami Al-Rfou Noah Constant VPVLM 280 3,848 0 18 Apr 2021
In Defense of Pseudo-Labeling: An Uncertainty-Aware Pseudo-label Selection Framework for Semi-Supervised Learning Mamshad Nayeem Rizve Kevin Duarte Y. S. Rawat M. Shah 241 509 0 15 Jan 2021
DynaSent: A Dynamic Benchmark for Sentiment Analysis Christopher Potts Zhengxuan Wu Atticus Geiger Douwe Kiela 230 77 0 30 Dec 2020
Calibration of Pre-trained Transformers Shrey Desai Greg Durrett UQLM 243 289 0 17 Mar 2020
GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding Alex Jinpeng Wang Amanpreet Singh Julian Michael Felix Hill Omer Levy Samuel R. Bowman ELM 297 6,959 0 20 Apr 2018
Simple and Scalable Predictive Uncertainty Estimation using Deep Ensembles Balaji Lakshminarayanan Alexander Pritzel Charles Blundell UQCV BDL 276 5,661 0 05 Dec 2016
Dropout as a Bayesian Approximation: Representing Model Uncertainty in Deep Learning Y. Gal Zoubin Ghahramani UQCV BDL 285 9,138 0 06 Jun 2015