Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2505.16170
Cited By
When Do LLMs Admit Their Mistakes? Understanding the Role of Model Belief in Retraction
22 May 2025
Yuqing Yang
Robin Jia
KELM
LRM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"When Do LLMs Admit Their Mistakes? Understanding the Role of Model Belief in Retraction"
41 / 41 papers shown
Title
Reasoning Models Know When They're Right: Probing Hidden States for Self-Verification
Anqi Zhang
Yulin Chen
Jane Pan
Chen Zhao
Aurojit Panda
Jinyang Li
He He
ReLM
LRM
75
11
0
07 Apr 2025
Stop Overthinking: A Survey on Efficient Reasoning for Large Language Models
Yang Sui
Yu-Neng Chuang
Guanchu Wang
Jiamu Zhang
Tianyi Zhang
...
Hongyi Liu
Andrew Wen
Shaochen
Zhong
Hanjie Chen
OffRL
ReLM
LRM
143
71
0
20 Mar 2025
Do NOT Think That Much for 2+3=? On the Overthinking of o1-Like LLMs
Xingyu Chen
Jiahao Xu
Tian Liang
Zhiwei He
Jianhui Pang
...
Zizhuo Zhang
Rui Wang
Zhaopeng Tu
Haitao Mi
Dong Yu
LRM
ReLM
125
158
0
30 Dec 2024
Confidence v.s. Critique: A Decomposition of Self-Correction Capability for LLMs
Zhiyong Yang
Yanzhe Zhang
Yudong Wang
Ziyao Xu
Junyang Lin
Zhifang Sui
LRM
43
6
0
27 Dec 2024
Understanding the Dark Side of LLMs' Intrinsic Self-Correction
Qingjie Zhang
Han Qiu
Di Wang
Haoting Qian
Yiming Li
Tianwei Zhang
Minlie Huang
LRM
79
10
0
19 Dec 2024
Distinguishing Ignorance from Error in LLM Hallucinations
Adi Simhi
Jonathan Herzig
Idan Szpektor
Yonatan Belinkov
HILM
73
3
0
29 Oct 2024
LLMs Know More Than They Show: On the Intrinsic Representation of LLM Hallucinations
Hadas Orgad
Michael Toker
Zorik Gekhman
Roi Reichart
Idan Szpektor
Hadas Kotek
Yonatan Belinkov
HILM
AIFin
85
37
0
03 Oct 2024
Programming Refusal with Conditional Activation Steering
Bruce W. Lee
Inkit Padhi
Karthikeyan N. Ramamurthy
Erik Miehling
Pierre Dognin
Manish Nagireddy
Amit Dhurandhar
LLMSV
132
20
0
06 Sep 2024
Refusal in Language Models Is Mediated by a Single Direction
Andy Arditi
Oscar Obeso
Aaquib Syed
Daniel Paleka
Nina Panickssery
Wes Gurnee
Neel Nanda
70
168
0
17 Jun 2024
Evaluating Mathematical Reasoning of Large Language Models: A Focus on Error Identification and Correction
Xiaoyuan Li
Wenjie Wang
Moxin Li
Junrong Guo
Yang Zhang
Fuli Feng
ELM
LRM
62
18
0
02 Jun 2024
Large Language Models Can Self-Correct with Minimal Effort
Zhenyu Wu
Qingkai Zeng
Zhihan Zhang
Zhaoxuan Tan
Chao Shen
Meng Jiang
KELM
LRM
ReLM
58
3
0
23 May 2024
Truth-value judgment in language models: belief directions are context sensitive
Stefan F. Schouten
Peter Bloem
Ilia Markov
Piek Vossen
KELM
115
2
0
29 Apr 2024
Can LLMs Learn from Previous Mistakes? Investigating LLMs' Errors to Boost for Reasoning
Yongqi Tong
Dawei Li
Sizhe Wang
Yujia Wang
Fei Teng
Jingbo Shang
LRM
73
55
0
29 Mar 2024
On Large Language Models' Hallucination with Regard to Known Facts
Che Jiang
Biqing Qi
Xiangyu Hong
Dayuan Fu
Yang Cheng
Fandong Meng
Mo Yu
Bowen Zhou
Jie Zhou
HILM
LRM
49
18
0
29 Mar 2024
LlamaFactory: Unified Efficient Fine-Tuning of 100+ Language Models
Yaowei Zheng
Richong Zhang
Junhao Zhang
Yanhan Ye
Zheyan Luo
Zhangchi Feng
Yongqiang Ma
99
479
0
20 Mar 2024
Bugs in Large Language Models Generated Code: An Empirical Study
Florian Tambon
Arghavan Moradi Dakhel
Amin Nikanjam
Foutse Khomh
Michel C. Desmarais
G. Antoniol
ELM
52
36
0
13 Mar 2024
Fine-Tuning Enhances Existing Mechanisms: A Case Study on Entity Tracking
Nikhil Prakash
Tamar Rott Shaham
Tal Haklay
Yonatan Belinkov
David Bau
58
62
0
22 Feb 2024
Hallucination is Inevitable: An Innate Limitation of Large Language Models
Ziwei Xu
Sanjay Jain
Mohan S. Kankanhalli
HILM
LRM
97
235
0
22 Jan 2024
The Dawn After the Dark: An Empirical Study on Factuality Hallucination in Large Language Models
Junyi Li
Jie Chen
Ruiyang Ren
Xiaoxue Cheng
Wayne Xin Zhao
Jian-Yun Nie
Ji-Rong Wen
HILM
52
50
0
06 Jan 2024
Do Androids Know They're Only Dreaming of Electric Sheep?
Sky CH-Wang
Benjamin Van Durme
Jason Eisner
Chris Kedzie
HILM
44
31
0
28 Dec 2023
Alignment for Honesty
Yuqing Yang
Ethan Chern
Xipeng Qiu
Graham Neubig
Pengfei Liu
54
32
0
12 Dec 2023
LLMs cannot find reasoning errors, but can correct them given the error location
Gladys Tyen
Hassan Mansoor
Victor Carbune
Peter Chen
Tony Mak
LRM
75
79
0
14 Nov 2023
Plan, Verify and Switch: Integrated Reasoning with Diverse X-of-Thoughts
Tengxiao Liu
Qipeng Guo
Yuqing Yang
Xiangkun Hu
Yue Zhang
Xipeng Qiu
Zheng Zhang
LRM
LLMAG
31
32
0
23 Oct 2023
The Geometry of Truth: Emergent Linear Structure in Large Language Model Representations of True/False Datasets
Samuel Marks
Max Tegmark
HILM
112
199
0
10 Oct 2023
Large Language Models Cannot Self-Correct Reasoning Yet
Jie Huang
Xinyun Chen
Swaroop Mishra
Huaixiu Steven Zheng
Adams Wei Yu
Xinying Song
Denny Zhou
ReLM
LRM
57
445
0
03 Oct 2023
Attention Satisfies: A Constraint-Satisfaction Lens on Factual Errors of Language Models
Mert Yuksekgonul
Varun Chandrasekaran
Erik Jones
Suriya Gunasekar
Ranjita Naik
Hamid Palangi
Ece Kamar
Besmira Nushi
HILM
34
44
0
26 Sep 2023
The Reversal Curse: LLMs trained on "A is B" fail to learn "B is A"
Lukas Berglund
Meg Tong
Max Kaufmann
Mikita Balesni
Asa Cooper Stickland
Tomasz Korbak
Owain Evans
LRM
56
260
0
21 Sep 2023
Chain-of-Verification Reduces Hallucination in Large Language Models
Shehzaad Dhuliawala
M. Komeili
Jing Xu
Roberta Raileanu
Xian Li
Asli Celikyilmaz
Jason Weston
LRM
HILM
32
186
0
20 Sep 2023
Siren's Song in the AI Ocean: A Survey on Hallucination in Large Language Models
Yue Zhang
Yafu Li
Leyang Cui
Deng Cai
Lemao Liu
...
Longyue Wang
Anh Tuan Luu
Wei Bi
Freda Shi
Shuming Shi
RALM
LRM
HILM
75
540
0
03 Sep 2023
Still No Lie Detector for Language Models: Probing Empirical and Conceptual Roadblocks
B. Levinstein
Daniel A. Herrmann
36
56
0
30 Jun 2023
Judging LLM-as-a-Judge with MT-Bench and Chatbot Arena
Lianmin Zheng
Wei-Lin Chiang
Ying Sheng
Siyuan Zhuang
Zhanghao Wu
...
Dacheng Li
Eric Xing
Haotong Zhang
Joseph E. Gonzalez
Ion Stoica
ALM
OSLM
ELM
252
4,186
0
09 Jun 2023
Inference-Time Intervention: Eliciting Truthful Answers from a Language Model
Kenneth Li
Oam Patel
Fernanda Viégas
Hanspeter Pfister
Martin Wattenberg
KELM
HILM
83
528
0
06 Jun 2023
How Language Model Hallucinations Can Snowball
Muru Zhang
Ofir Press
William Merrill
Alisa Liu
Noah A. Smith
HILM
LRM
103
270
0
22 May 2023
Dissecting Recall of Factual Associations in Auto-Regressive Language Models
Mor Geva
Jasmijn Bastings
Katja Filippova
Amir Globerson
KELM
230
297
0
28 Apr 2023
The Internal State of an LLM Knows When It's Lying
A. Azaria
Tom Michael Mitchell
HILM
238
322
0
26 Apr 2023
Self-Refine: Iterative Refinement with Self-Feedback
Aman Madaan
Niket Tandon
Prakhar Gupta
Skyler Hallinan
Luyu Gao
...
Bodhisattwa Prasad Majumder
Katherine Hermann
Sean Welleck
Amir Yazdanbakhsh
Peter Clark
ReLM
LRM
DiffM
106
1,577
0
30 Mar 2023
Discovering Latent Knowledge in Language Models Without Supervision
Collin Burns
Haotian Ye
Dan Klein
Jacob Steinhardt
102
350
0
07 Dec 2022
Language Models (Mostly) Know What They Know
Saurav Kadavath
Tom Conerly
Amanda Askell
T. Henighan
Dawn Drain
...
Nicholas Joseph
Benjamin Mann
Sam McCandlish
C. Olah
Jared Kaplan
ELM
97
787
0
11 Jul 2022
Locating and Editing Factual Associations in GPT
Kevin Meng
David Bau
A. Andonian
Yonatan Belinkov
KELM
162
1,308
0
10 Feb 2022
Crowdsourcing Multiple Choice Science Questions
Johannes Welbl
Nelson F. Liu
Matt Gardner
AI4Ed
52
493
0
19 Jul 2017
TriviaQA: A Large Scale Distantly Supervised Challenge Dataset for Reading Comprehension
Mandar Joshi
Eunsol Choi
Daniel S. Weld
Luke Zettlemoyer
RALM
173
2,610
0
09 May 2017
1