Sparse Interventions in Language Models with Differentiable Masking

13 December 2021

Papers citing "Sparse Interventions in Language Models with Differentiable Masking"

4 / 4 papers shown

Title
MIB: A Mechanistic Interpretability Benchmark Aaron Mueller Atticus Geiger Sarah Wiegreffe Dana Arad Iván Arcuschin ... Alessandro Stolfo Martin Tutek Amir Zur David Bau Yonatan Belinkov 41 1 0 17 Apr 2025
Causal interventions expose implicit situation models for commonsense language understanding Takateru Yamakoshi James L. McClelland A. Goldberg Robert D. Hawkins 17 5 0 06 Jun 2023
Does Localization Inform Editing? Surprising Differences in Causality-Based Localization vs. Knowledge Editing in Language Models Peter Hase Mohit Bansal Been Kim Asma Ghandeharioun MILM 18 167 0 10 Jan 2023
Causal Proxy Models for Concept-Based Model Explanations Zhengxuan Wu Karel DÓosterlinck Atticus Geiger Amir Zur Christopher Potts MILM 68 35 0 28 Sep 2022