Identifying Linear Relational Concepts in Large Language Models

15 November 2023

Papers citing "Identifying Linear Relational Concepts in Large Language Models"

7 / 7 papers shown

Title
Geospatial Mechanistic Interpretability of Large Language Models Stef De Sabbata Stefano Mizzaro Kevin Roitero AI4CE 21 0 0 06 May 2025
On Linear Representations and Pretraining Data Frequency in Language Models Jack Merullo Noah A. Smith Sarah Wiegreffe Yanai Elazar 32 0 0 16 Apr 2025
From Tokens to Lattices: Emergent Lattice Structures in Language Models Bo Xiong Steffen Staab LRM 18 0 0 04 Apr 2025
Mechanistic Interpretability for AI Safety -- A Review Leonard Bereska E. Gavves AI4CE 32 111 0 22 Apr 2024
Dissecting Recall of Factual Associations in Auto-Regressive Language Models Mor Geva Jasmijn Bastings Katja Filippova Amir Globerson KELM 189 260 0 28 Apr 2023
Toy Models of Superposition Nelson Elhage Tristan Hume Catherine Olsson Nicholas Schiefer T. Henighan ... Sam McCandlish Jared Kaplan Dario Amodei Martin Wattenberg C. Olah AAML MILM 117 314 0 21 Sep 2022
Probing Classifiers: Promises, Shortcomings, and Advances Yonatan Belinkov 221 291 0 24 Feb 2021