Adversarial Attacks on the Interpretation of Neuron Activation Maximization

12 June 2023

Papers citing "Adversarial Attacks on the Interpretation of Neuron Activation Maximization"

7 / 7 papers shown

Title
Linear Explanations for Individual Neurons Tuomas P. Oikarinen Tsui-Wei Weng FAtt MILM 29 5 0 10 May 2024
Reliability of CKA as a Similarity Measure in Deep Learning Mohammad-Javad Davari Stefan Horoi A. Natik Guillaume Lajoie Guy Wolf Eugene Belilovsky AAML 76 36 0 28 Oct 2022
Conformalized Fairness via Quantile Regression Meichen Liu Lei Ding Dengdeng Yu Wulong Liu Linglong Kong Bei Jiang 42 9 0 05 Oct 2022
Natural Language Descriptions of Deep Visual Features Evan Hernandez Sarah Schwettmann David Bau Teona Bagashvili Antonio Torralba Jacob Andreas MILM 196 117 0 26 Jan 2022
"Will You Find These Shortcuts?" A Protocol for Evaluating the Faithfulness of Input Salience Methods for Text Classification Jasmijn Bastings Sebastian Ebert Polina Zablotskaia Anders Sandholm Katja Filippova 115 75 0 14 Nov 2021
Scaling Laws for Neural Language Models Jared Kaplan Sam McCandlish T. Henighan Tom B. Brown B. Chess R. Child Scott Gray Alec Radford Jeff Wu Dario Amodei 228 4,460 0 23 Jan 2020
A Survey on Bias and Fairness in Machine Learning Ninareh Mehrabi Fred Morstatter N. Saxena Kristina Lerman Aram Galstyan SyDa FaML 314 4,203 0 23 Aug 2019