Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2307.03056
Cited By
Generalizing Backpropagation for Gradient-Based Interpretability
6 July 2023
Kevin Du
Lucas Torroba Hennigen
Niklas Stoehr
Alex Warstadt
Ryan Cotterell
MILM
FAtt
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Generalizing Backpropagation for Gradient-Based Interpretability"
5 / 5 papers shown
Title
The Gradient of Algebraic Model Counting
Jaron Maene
Luc de Raedt
49
0
0
25 Feb 2025
Activation Scaling for Steering and Interpreting Language Models
Niklas Stoehr
Kevin Du
Vésteinn Snæbjarnarson
Robert West
Ryan Cotterell
Aaron Schein
LLMSV
LRM
29
4
0
07 Oct 2024
Exploring the Trade-off Between Model Performance and Explanation Plausibility of Text Classifiers Using Human Rationales
Lucas Resck
Marcos M. Raimundo
Jorge Poco
24
1
0
03 Apr 2024
Localizing Paragraph Memorization in Language Models
Niklas Stoehr
Mitchell Gordon
Chiyuan Zhang
Owen Lewis
MU
38
13
0
28 Mar 2024
Successor Heads: Recurring, Interpretable Attention Heads In The Wild
Rhys Gould
Euan Ong
George Ogden
Arthur Conmy
LRM
6
44
0
14 Dec 2023
1