Estimating Knowledge in Large Language Models Without Generating a Single Token

18 June 2024

Papers citing "Estimating Knowledge in Large Language Models Without Generating a Single Token"

19 / 19 papers shown

Title
Investigating task-specific prompts and sparse autoencoders for activation monitoring Henk Tillman Dan Mossing LLMSV 42 0 0 28 Apr 2025
Emergence and Evolution of Interpretable Concepts in Diffusion Models Berk Tinaz Zalan Fabian Mahdi Soltanolkotabi DiffM 19 0 0 21 Apr 2025
Large Language Models Share Representations of Latent Grammatical Concepts Across Typologically Diverse Languages Jannik Brinkmann Chris Wendler Christian Bartelt Aaron Mueller 41 9 0 10 Jan 2025
Do Large Language Models Perform Latent Multi-Hop Reasoning without Exploiting Shortcuts? Sohee Yang Nora Kassner E. Gribovskaya Sebastian Riedel Mor Geva KELM LRM ReLM 72 4 0 25 Nov 2024
Do I Know This Entity? Knowledge Awareness and Hallucinations in Language Models Javier Ferrando Oscar Obeso Senthooran Rajamanoharan Neel Nanda 67 10 0 21 Nov 2024
ChroKnowledge: Unveiling Chronological Knowledge of Language Models in Multiple Domains Yein Park Chanwoong Yoon Jungwoo Park Donghyeon Lee Minbyul Jeong Jaewoo Kang KELM 45 1 0 13 Oct 2024
LLMs Know More Than They Show: On the Intrinsic Representation of LLM Hallucinations Hadas Orgad Michael Toker Zorik Gekhman Roi Reichart Idan Szpektor Hadas Kotek Yonatan Belinkov HILM AIFin 42 24 0 03 Oct 2024
From Loops to Oops: Fallback Behaviors of Language Models Under Uncertainty Maor Ivgi Ori Yoran Jonathan Berant Mor Geva HILM 41 8 0 08 Jul 2024
LLM Internal States Reveal Hallucination Risk Faced With a Query Ziwei Ji Delong Chen Etsuko Ishii Samuel Cahyawijaya Yejin Bang Bryan Wilie Pascale Fung HILM LRM 26 18 0 03 Jul 2024
I've got the "Answer"! Interpretation of LLMs Hidden States in Question Answering Valeriya Goloviznina Evgeny Kotelnikov 19 1 0 04 Jun 2024
Locating and Editing Factual Associations in Mamba Arnab Sen Sharma David Atkinson David Bau KELM 66 16 0 04 Apr 2024
RAVEL: Evaluating Interpretability Methods on Disentangling Language Model Representations Jing-ling Huang Zhengxuan Wu Christopher Potts Mor Geva Atticus Geiger 53 24 0 27 Feb 2024
On Early Detection of Hallucinations in Factual Question Answering Ben Snyder Marius Moisescu Muhammad Bilal Zafar HILM 39 24 0 19 Dec 2023
The Geometry of Truth: Emergent Linear Structure in Large Language Model Representations of True/False Datasets Samuel Marks Max Tegmark HILM 91 164 0 10 Oct 2023
Dissecting Recall of Factual Associations in Auto-Regressive Language Models Mor Geva Jasmijn Bastings Katja Filippova Amir Globerson KELM 189 260 0 28 Apr 2023
Crawling the Internal Knowledge-Base of Language Models Roi Cohen Mor Geva Jonathan Berant Amir Globerson 170 74 0 30 Jan 2023
Finding a Balanced Degree of Automation for Summary Evaluation Shiyue Zhang Mohit Bansal 47 32 0 23 Sep 2021
A Token-level Reference-free Hallucination Detection Benchmark for Free-form Text Generation Tianyu Liu Yizhe Zhang Chris Brockett Yi Mao Zhifang Sui Weizhu Chen W. Dolan HILM 212 140 0 18 Apr 2021
Language Models as Knowledge Bases? Fabio Petroni Tim Rocktaschel Patrick Lewis A. Bakhtin Yuxiang Wu Alexander H. Miller Sebastian Riedel KELM AI4MH 393 2,216 0 03 Sep 2019