Unfamiliar Finetuning Examples Control How Language Models Hallucinate

Unfamiliar Finetuning Examples Control How Language Models Hallucinate

8 March 2024

Papers citing "Unfamiliar Finetuning Examples Control How Language Models Hallucinate"

14 / 14 papers shown

Title
Memorization and Knowledge Injection in Gated LLMs Xu Pan Ely Hahami Zechen Zhang H. Sompolinsky KELM CLL RALM 104 0 0 30 Apr 2025
Think, Prune, Train, Improve: Scaling Reasoning without Scaling Models Caia Costello Simon Guo Anna Goldie Azalia Mirhoseini ReLM SyDa LRM 99 1 0 25 Apr 2025
Self-Memory Alignment: Mitigating Factual Hallucinations with Generalized Improvement Siyuan Zhang Y. Zhang Yinpeng Dong Hang Su HILM KELM 117 0 0 26 Feb 2025
Gradient Localization Improves Lifelong Pretraining of Language Models Jared Fernandez Yonatan Bisk Emma Strubell KELM 31 1 0 07 Nov 2024
Improving Model Factuality with Fine-grained Critique-based Evaluator Yiqing Xie Wenxuan Zhou Pradyot Prakash Di Jin Yuning Mao ... Sinong Wang Han Fang Carolyn Rose Daniel Fried Hejia Zhang HILM 33 5 0 24 Oct 2024
Automatic Curriculum Expert Iteration for Reliable LLM Reasoning Zirui Zhao Hanze Dong Amrita Saha Caiming Xiong Doyen Sahoo LRM 27 3 0 10 Oct 2024
From Loops to Oops: Fallback Behaviors of Language Models Under Uncertainty Maor Ivgi Ori Yoran Jonathan Berant Mor Geva HILM 47 8 0 08 Jul 2024
Cognitive Dissonance: Why Do Language Model Outputs Disagree with Internal Representations of Truthfulness? Kevin Liu Stephen Casper Dylan Hadfield-Menell Jacob Andreas HILM 52 36 0 27 Nov 2023
The Internal State of an LLM Knows When It's Lying A. Azaria Tom Michael Mitchell HILM 216 298 0 26 Apr 2023
Sparks of Artificial General Intelligence: Early experiments with GPT-4 Sébastien Bubeck Varun Chandrasekaran Ronen Eldan J. Gehrke Eric Horvitz ... Scott M. Lundberg Harsha Nori Hamid Palangi Marco Tulio Ribeiro Yi Zhang ELM AI4MH AI4CE ALM 230 2,989 0 22 Mar 2023
SelfCheckGPT: Zero-Resource Black-Box Hallucination Detection for Generative Large Language Models Potsawee Manakul Adian Liusie Mark J. F. Gales HILM LRM 150 390 0 15 Mar 2023
Training language models to follow instructions with human feedback Long Ouyang Jeff Wu Xu Jiang Diogo Almeida Carroll L. Wainwright ... Amanda Askell Peter Welinder Paul Christiano Jan Leike Ryan J. Lowe OSLM ALM 303 11,881 0 04 Mar 2022
The Pile: An 800GB Dataset of Diverse Text for Language Modeling Leo Gao Stella Biderman Sid Black Laurence Golding Travis Hoppe ... Horace He Anish Thite Noa Nabeshima Shawn Presser Connor Leahy AIMat 245 1,977 0 31 Dec 2020
Offline Reinforcement Learning: Tutorial, Review, and Perspectives on Open Problems Sergey Levine Aviral Kumar George Tucker Justin Fu OffRL GP 329 1,944 0 04 May 2020