pyvene: A Library for Understanding and Improving PyTorch Models via
Interventions

pyvene: A Library for Understanding and Improving PyTorch Models via Interventions

12 March 2024

Jing-ling Huang

Zheng Wang

Noah D. Goodman

Christopher D. Manning

Christopher Potts

Papers citing "pyvene: A Library for Understanding and Improving PyTorch Models via Interventions"

5 / 5 papers shown

Title
Personality Alignment of Large Language Models Minjun Zhu Linyi Yang Yue Zhang Yue Zhang ALM 28 5 0 21 Aug 2024
Uncovering Intermediate Variables in Transformers using Circuit Probing Michael A. Lepori Thomas Serre Ellie Pavlick 46 7 0 07 Nov 2023
Dissecting Recall of Factual Associations in Auto-Regressive Language Models Mor Geva Jasmijn Bastings Katja Filippova Amir Globerson KELM 180 152 0 28 Apr 2023
Finding Alignments Between Interpretable Causal Variables and Distributed Neural Representations Atticus Geiger Zhengxuan Wu Christopher Potts Thomas F. Icard Noah D. Goodman CML 73 98 0 05 Mar 2023
Interpretability in the Wild: a Circuit for Indirect Object Identification in GPT-2 small Kevin Wang Alexandre Variengien Arthur Conmy Buck Shlegeris Jacob Steinhardt 205 486 0 01 Nov 2022