ProxySPEX: Inference-Efficient Interpretability via Sparse Feature Interactions in LLMs

23 May 2025

Papers citing "ProxySPEX: Inference-Efficient Interpretability via Sparse Feature Interactions in LLMs"

7 / 7 papers shown

Title
SPEX: Scaling Feature Interaction Explanations for LLMs Justin Singh Kang Landon Butler Abhineet Agarwal Yigit Efe Erginbas Ramtin Pedarsani Kannan Ramchandran Bin Yu VLM LRM 114 2 0 20 Feb 2025
Attribution Patching Outperforms Automated Circuit Discovery Aaquib Syed Can Rager Arthur Conmy 110 61 0 16 Oct 2023
DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter Victor Sanh Lysandre Debut Julien Chaumond Thomas Wolf 117 7,386 0 02 Oct 2019
A Fine-Grained Spectral Perspective on Neural Networks Greg Yang Hadi Salman 57 112 0 24 Jul 2019
Deep learning generalizes because the parameter-function map is biased towards simple functions Guillermo Valle Pérez Chico Q. Camargo A. Louis MLT AI4CE 49 231 0 22 May 2018
Axiomatic Attribution for Deep Networks Mukund Sundararajan Ankur Taly Qiqi Yan OOD FAtt 108 5,920 0 04 Mar 2017
Layer-wise Relevance Propagation for Neural Networks with Local Renormalization Layers Alexander Binder G. Montavon Sebastian Lapuschkin K. Müller Wojciech Samek FAtt 54 456 0 04 Apr 2016