A Glitch in the Matrix? Locating and Detecting Language Model Grounding with Fakepedia

A Glitch in the Matrix? Locating and Detecting Language Model Grounding with Fakepedia

4 December 2023

Martin Josifoski

Vishrav Chaudhary

Papers citing "A Glitch in the Matrix? Locating and Detecting Language Model Grounding with Fakepedia"

19 / 19 papers shown

Title
Adapting Large Language Models for Multi-Domain Retrieval-Augmented-Generation Alexandre Misrahi Nadezhda Chirkova Maxime Louis Vassilina Nikoulina RALM 69 0 0 03 Apr 2025
Everything, Everywhere, All at Once: Is Mechanistic Interpretability Identifiable? Maxime Méloux Silviu Maniu François Portet Maxime Peyrard 29 0 0 28 Feb 2025
Controllable Context Sensitivity and the Knob Behind It Julian Minder Kevin Du Niklas Stoehr Giovanni Monea Chris Wendler Robert West Ryan Cotterell KELM 31 3 0 11 Nov 2024
Fact Recall, Heuristics or Pure Guesswork? Precise Interpretations of Language Models for Fact Completion Denitsa Saynova Lovisa Hagström Moa Johansson Richard Johansson Marco Kuhlmann HILM 21 0 0 18 Oct 2024
LabSafety Bench: Benchmarking LLMs on Safety Issues in Scientific Labs Yujun Zhou Jingdong Yang Kehan Guo Pin-Yu Chen Tian Gao ... Tian Gao Werner Geyer Nuno Moniz Nitesh V Chawla Xiangliang Zhang 21 4 0 18 Oct 2024
Probing Language Models on Their Knowledge Source Zineddine Tighidet Andrea Mogini Jiali Mei Benjamin Piwowarski Patrick Gallinari KELM 22 1 0 08 Oct 2024
A Mechanistic Interpretation of Syllogistic Reasoning in Auto-Regressive Language Models Geonhee Kim Marco Valentino André Freitas LRM AI4CE 20 7 0 16 Aug 2024
ACCORD: Closing the Commonsense Measurability Gap François Roewer-Després Jinyue Feng Zining Zhu Frank Rudzicz LRM 24 0 0 04 Jun 2024
Mechanistic Interpretability for AI Safety -- A Review Leonard Bereska E. Gavves AI4CE 22 95 0 22 Apr 2024
Monotonic Representation of Numeric Properties in Language Models Benjamin Heinzerling Kentaro Inui KELM MILM 26 9 0 15 Mar 2024
Do Llamas Work in English? On the Latent Language of Multilingual Transformers Chris Wendler V. Veselovsky Giovanni Monea Robert West 45 65 0 16 Feb 2024
Do Androids Know They're Only Dreaming of Electric Sheep? Sky CH-Wang Benjamin Van Durme Jason Eisner Chris Kedzie HILM 14 25 0 28 Dec 2023
Dissecting Recall of Factual Associations in Auto-Regressive Language Models Mor Geva Jasmijn Bastings Katja Filippova Amir Globerson KELM 180 152 0 28 Apr 2023
Exploiting Asymmetry for Synthetic Training Data Generation: SynthIE and the Case of Information Extraction Martin Josifoski Marija Sakota Maxime Peyrard Robert West SyDa 54 76 0 07 Mar 2023
Quantifying Context Mixing in Transformers Hosein Mohebbi Willem H. Zuidema Grzegorz Chrupała A. Alishahi 156 24 0 30 Jan 2023
Interpretability in the Wild: a Circuit for Indirect Object Identification in GPT-2 small Kevin Wang Alexandre Variengien Arthur Conmy Buck Shlegeris Jacob Steinhardt 205 486 0 01 Nov 2022
Causal Proxy Models for Concept-Based Model Explanations Zhengxuan Wu Karel DÓosterlinck Atticus Geiger Amir Zur Christopher Potts MILM 68 35 0 28 Sep 2022
Entity-Based Knowledge Conflicts in Question Answering Shayne Longpre Kartik Perisetla Anthony Chen Nikhil Ramesh Chris DuBois Sameer Singh HILM 230 177 0 10 Sep 2021
Measuring and Improving Consistency in Pretrained Language Models Yanai Elazar Nora Kassner Shauli Ravfogel Abhilasha Ravichander Eduard H. Hovy Hinrich Schütze Yoav Goldberg HILM 246 273 0 01 Feb 2021