Penzai + Treescope: A Toolkit for Interpreting, Visualizing, and Editing Models As Data

1 August 2024

Papers citing "Penzai + Treescope: A Toolkit for Interpreting, Visualizing, and Editing Models As Data"

3 / 3 papers shown

Title
Scaling sparse feature circuit finding for in-context learning Dmitrii Kharlapenko S. Kamath S Fazl Barez Arthur Conmy Neel Nanda 23 0 0 18 Apr 2025
softmax is not enough (for sharp out-of-distribution) Petar Veličković Christos Perivolaropoulos Federico Barbero Razvan Pascanu 37 17 0 01 Oct 2024
Interpretability in the Wild: a Circuit for Indirect Object Identification in GPT-2 small Kevin Wang Alexandre Variengien Arthur Conmy Buck Shlegeris Jacob Steinhardt 210 486 0 01 Nov 2022