Hypothesis Testing the Circuit Hypothesis in LLMs

16 October 2024

Papers citing "Hypothesis Testing the Circuit Hypothesis in LLMs"

3 / 3 papers shown

Title
Evaluating Explanations: An Explanatory Virtues Framework for Mechanistic Interpretability -- The Strange Science Part I.ii Kola Ayonrinde Louis Jaburi XAI 55 1 0 02 May 2025
A Mathematical Philosophy of Explanations in Mechanistic Interpretability -- The Strange Science Part I.i Kola Ayonrinde Louis Jaburi MILM 68 1 0 01 May 2025
Are formal and functional linguistic mechanisms dissociated in language models? Michael Hanna Sandro Pezzelle Yonatan Belinkov 41 0 0 14 Mar 2025