Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2402.12560
Cited By
CausalGym: Benchmarking causal interpretability methods on linguistic tasks
19 February 2024
Aryaman Arora
Daniel Jurafsky
Christopher Potts
Re-assign community
ArXiv
PDF
HTML
Papers citing
"CausalGym: Benchmarking causal interpretability methods on linguistic tasks"
8 / 8 papers shown
Title
Large Language Models Share Representations of Latent Grammatical Concepts Across Typologically Diverse Languages
Jannik Brinkmann
Chris Wendler
Christian Bartelt
Aaron Mueller
35
9
0
10 Jan 2025
Language models align with human judgments on key grammatical constructions
Jennifer Hu
Kyle Mahowald
G. Lupyan
Anna A. Ivanova
Roger Levy
22
10
0
19 Jan 2024
The Geometry of Truth: Emergent Linear Structure in Large Language Model Representations of True/False Datasets
Samuel Marks
Max Tegmark
HILM
85
164
0
10 Oct 2023
A Geometric Notion of Causal Probing
Clément Guerner
Anej Svete
Tianyu Liu
Alex Warstadt
Ryan Cotterell
LLMSV
24
12
0
27 Jul 2023
Finding Alignments Between Interpretable Causal Variables and Distributed Neural Representations
Atticus Geiger
Zhengxuan Wu
Christopher Potts
Thomas F. Icard
Noah D. Goodman
CML
73
98
0
05 Mar 2023
Interpretability in the Wild: a Circuit for Indirect Object Identification in GPT-2 small
Kevin Wang
Alexandre Variengien
Arthur Conmy
Buck Shlegeris
Jacob Steinhardt
205
486
0
01 Nov 2022
Naturalistic Causal Probing for Morpho-Syntax
Afra Amini
Tiago Pimentel
Clara Meister
Ryan Cotterell
MILM
93
13
0
14 May 2022
Probing Classifiers: Promises, Shortcomings, and Advances
Yonatan Belinkov
216
291
0
24 Feb 2021
1