ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2307.03056
  4. Cited By
Generalizing Backpropagation for Gradient-Based Interpretability

Generalizing Backpropagation for Gradient-Based Interpretability

6 July 2023
Kevin Du
Lucas Torroba Hennigen
Niklas Stoehr
Alex Warstadt
Ryan Cotterell
    MILM
    FAtt
ArXivPDFHTML

Papers citing "Generalizing Backpropagation for Gradient-Based Interpretability"

5 / 5 papers shown
Title
The Gradient of Algebraic Model Counting
The Gradient of Algebraic Model Counting
Jaron Maene
Luc de Raedt
49
0
0
25 Feb 2025
Activation Scaling for Steering and Interpreting Language Models
Activation Scaling for Steering and Interpreting Language Models
Niklas Stoehr
Kevin Du
Vésteinn Snæbjarnarson
Robert West
Ryan Cotterell
Aaron Schein
LLMSV
LRM
29
4
0
07 Oct 2024
Exploring the Trade-off Between Model Performance and Explanation
  Plausibility of Text Classifiers Using Human Rationales
Exploring the Trade-off Between Model Performance and Explanation Plausibility of Text Classifiers Using Human Rationales
Lucas Resck
Marcos M. Raimundo
Jorge Poco
24
1
0
03 Apr 2024
Localizing Paragraph Memorization in Language Models
Localizing Paragraph Memorization in Language Models
Niklas Stoehr
Mitchell Gordon
Chiyuan Zhang
Owen Lewis
MU
38
13
0
28 Mar 2024
Successor Heads: Recurring, Interpretable Attention Heads In The Wild
Successor Heads: Recurring, Interpretable Attention Heads In The Wild
Rhys Gould
Euan Ong
George Ogden
Arthur Conmy
LRM
6
44
0
14 Dec 2023
1