Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2312.02073
Cited By
A Glitch in the Matrix? Locating and Detecting Language Model Grounding with Fakepedia
4 December 2023
Giovanni Monea
Maxime Peyrard
Martin Josifoski
Vishrav Chaudhary
Jason Eisner
Emre Kiciman
Hamid Palangi
Barun Patra
Robert West
KELM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"A Glitch in the Matrix? Locating and Detecting Language Model Grounding with Fakepedia"
19 / 19 papers shown
Title
Adapting Large Language Models for Multi-Domain Retrieval-Augmented-Generation
Alexandre Misrahi
Nadezhda Chirkova
Maxime Louis
Vassilina Nikoulina
RALM
69
0
0
03 Apr 2025
Everything, Everywhere, All at Once: Is Mechanistic Interpretability Identifiable?
Maxime Méloux
Silviu Maniu
François Portet
Maxime Peyrard
29
0
0
28 Feb 2025
Controllable Context Sensitivity and the Knob Behind It
Julian Minder
Kevin Du
Niklas Stoehr
Giovanni Monea
Chris Wendler
Robert West
Ryan Cotterell
KELM
31
3
0
11 Nov 2024
Fact Recall, Heuristics or Pure Guesswork? Precise Interpretations of Language Models for Fact Completion
Denitsa Saynova
Lovisa Hagström
Moa Johansson
Richard Johansson
Marco Kuhlmann
HILM
21
0
0
18 Oct 2024
LabSafety Bench: Benchmarking LLMs on Safety Issues in Scientific Labs
Yujun Zhou
Jingdong Yang
Kehan Guo
Pin-Yu Chen
Tian Gao
...
Tian Gao
Werner Geyer
Nuno Moniz
Nitesh V Chawla
Xiangliang Zhang
21
4
0
18 Oct 2024
Probing Language Models on Their Knowledge Source
Zineddine Tighidet
Andrea Mogini
Jiali Mei
Benjamin Piwowarski
Patrick Gallinari
KELM
22
1
0
08 Oct 2024
A Mechanistic Interpretation of Syllogistic Reasoning in Auto-Regressive Language Models
Geonhee Kim
Marco Valentino
André Freitas
LRM
AI4CE
20
7
0
16 Aug 2024
ACCORD: Closing the Commonsense Measurability Gap
François Roewer-Després
Jinyue Feng
Zining Zhu
Frank Rudzicz
LRM
24
0
0
04 Jun 2024
Mechanistic Interpretability for AI Safety -- A Review
Leonard Bereska
E. Gavves
AI4CE
22
95
0
22 Apr 2024
Monotonic Representation of Numeric Properties in Language Models
Benjamin Heinzerling
Kentaro Inui
KELM
MILM
26
9
0
15 Mar 2024
Do Llamas Work in English? On the Latent Language of Multilingual Transformers
Chris Wendler
V. Veselovsky
Giovanni Monea
Robert West
45
65
0
16 Feb 2024
Do Androids Know They're Only Dreaming of Electric Sheep?
Sky CH-Wang
Benjamin Van Durme
Jason Eisner
Chris Kedzie
HILM
14
25
0
28 Dec 2023
Dissecting Recall of Factual Associations in Auto-Regressive Language Models
Mor Geva
Jasmijn Bastings
Katja Filippova
Amir Globerson
KELM
180
152
0
28 Apr 2023
Exploiting Asymmetry for Synthetic Training Data Generation: SynthIE and the Case of Information Extraction
Martin Josifoski
Marija Sakota
Maxime Peyrard
Robert West
SyDa
54
76
0
07 Mar 2023
Quantifying Context Mixing in Transformers
Hosein Mohebbi
Willem H. Zuidema
Grzegorz Chrupała
A. Alishahi
156
24
0
30 Jan 2023
Interpretability in the Wild: a Circuit for Indirect Object Identification in GPT-2 small
Kevin Wang
Alexandre Variengien
Arthur Conmy
Buck Shlegeris
Jacob Steinhardt
205
486
0
01 Nov 2022
Causal Proxy Models for Concept-Based Model Explanations
Zhengxuan Wu
Karel DÓosterlinck
Atticus Geiger
Amir Zur
Christopher Potts
MILM
68
35
0
28 Sep 2022
Entity-Based Knowledge Conflicts in Question Answering
Shayne Longpre
Kartik Perisetla
Anthony Chen
Nikhil Ramesh
Chris DuBois
Sameer Singh
HILM
230
177
0
10 Sep 2021
Measuring and Improving Consistency in Pretrained Language Models
Yanai Elazar
Nora Kassner
Shauli Ravfogel
Abhilasha Ravichander
Eduard H. Hovy
Hinrich Schütze
Yoav Goldberg
HILM
246
273
0
01 Feb 2021
1