Understanding the Dark Side of LLMs' Intrinsic Self-Correction

19 December 2024

Papers citing "Understanding the Dark Side of LLMs' Intrinsic Self-Correction"

2 / 2 papers shown

Title
When Do LLMs Admit Their Mistakes? Understanding the Role of Model Belief in Retraction Yuqing Yang Robin Jia KELM LRM 63 0 0 22 May 2025
Smaller Large Language Models Can Do Moral Self-Correction Guangliang Liu Zhiyu Xue Rongrong Wang K. Johnson Kristen Marie Johnson LRM 57 0 0 30 Oct 2024