Learning Transformer Programs

Learning Transformer Programs

1 June 2023

Alexander Wettig

Papers citing "Learning Transformer Programs"

8 / 8 papers shown

Title
Representing Rule-based Chatbots with Transformers Dan Friedman Abhishek Panigrahi Danqi Chen 59 1 0 15 Jul 2024
Finding Transformer Circuits with Edge Pruning Adithya Bhaskar Alexander Wettig Dan Friedman Danqi Chen 58 14 0 24 Jun 2024
Codebook Features: Sparse and Discrete Interpretability for Neural Networks Alex Tamkin Mohammad Taufeeque Noah D. Goodman 22 27 0 26 Oct 2023
Finding Alignments Between Interpretable Causal Variables and Distributed Neural Representations Atticus Geiger Zhengxuan Wu Christopher Potts Thomas F. Icard Noah D. Goodman CML 73 98 0 05 Mar 2023
Interpretability in the Wild: a Circuit for Indirect Object Identification in GPT-2 small Kevin Wang Alexandre Variengien Arthur Conmy Buck Shlegeris Jacob Steinhardt 210 491 0 01 Nov 2022
In-context Learning and Induction Heads Catherine Olsson Nelson Elhage Neel Nanda Nicholas Joseph Nova Dassarma ... Tom B. Brown Jack Clark Jared Kaplan Sam McCandlish C. Olah 240 456 0 24 Sep 2022
Train Short, Test Long: Attention with Linear Biases Enables Input Length Extrapolation Ofir Press Noah A. Smith M. Lewis 245 695 0 27 Aug 2021
Probing Classifiers: Promises, Shortcomings, and Advances Yonatan Belinkov 224 402 0 24 Feb 2021