ProxySPEX: Inference-Efficient Interpretability via Sparse Feature Interactions in LLMs

23 May 2025

Papers citing "ProxySPEX: Inference-Efficient Interpretability via Sparse Feature Interactions in LLMs"

9 / 9 papers shown

Title
SPEX: Scaling Feature Interaction Explanations for LLMs Justin Singh Kang Landon Butler Abhineet Agarwal Yigit Efe Erginbas Ramtin Pedarsani Kannan Ramchandran Bin Yu VLM LRM 114 2 0 20 Feb 2025
Closed-Form Feedback-Free Learning with Forward Projection Robert O'Shea Bipin Rajendran 45 18 0 27 Jan 2025
Attribution Patching Outperforms Automated Circuit Discovery Aaquib Syed Can Rager Arthur Conmy 112 62 0 16 Oct 2023
DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter Victor Sanh Lysandre Debut Julien Chaumond Thomas Wolf 126 7,437 0 02 Oct 2019
A Fine-Grained Spectral Perspective on Neural Networks Greg Yang Hadi Salman 57 113 0 24 Jul 2019
Deep learning generalizes because the parameter-function map is biased towards simple functions Guillermo Valle Pérez Chico Q. Camargo A. Louis MLT AI4CE 55 231 0 22 May 2018
A Unified Approach to Interpreting Model Predictions Scott M. Lundberg Su-In Lee FAtt 538 21,613 0 22 May 2017
Axiomatic Attribution for Deep Networks Mukund Sundararajan Ankur Taly Qiqi Yan OOD FAtt 115 5,920 0 04 Mar 2017
Layer-wise Relevance Propagation for Neural Networks with Local Renormalization Layers Alexander Binder G. Montavon Sebastian Lapuschkin K. Müller Wojciech Samek FAtt 54 456 0 04 Apr 2016