Fooling Neural Network Interpretations via Adversarial Model Manipulation

6 February 2019

Papers citing "Fooling Neural Network Interpretations via Adversarial Model Manipulation"

50 / 56 papers shown

Title
Graphical Perception of Saliency-based Model Explanations Yayan Zhao Mingwei Li Matthew Berger XAI FAtt 49 2 0 11 Jun 2024
Explainable Graph Neural Networks Under Fire Zhong Li Simon Geisler Yuhang Wang Stephan Günnemann M. Leeuwen AAML 43 0 0 10 Jun 2024
Manifold Integrated Gradients: Riemannian Geometry for Feature Attribution Eslam Zaher Maciej Trzaskowski Quan Nguyen Fred Roosta AAML 29 4 0 16 May 2024
Robust Explainable Recommendation Sairamvinay Vijayaraghavan Prasant Mohapatra AAML 38 0 0 03 May 2024
Why You Should Not Trust Interpretations in Machine Learning: Adversarial Attacks on Partial Dependence Plots Xi Xin Giles Hooker Fei Huang AAML 46 7 0 29 Apr 2024
CAM-Based Methods Can See through Walls Magamed Taimeskhanov R. Sicre Damien Garreau 28 1 0 02 Apr 2024
Approximate Nullspace Augmented Finetuning for Robust Vision Transformers Haoyang Liu Aditya Singh Yijiang Li Haohan Wang AAML ViT 39 1 0 15 Mar 2024
Are Classification Robustness and Explanation Robustness Really Strongly Correlated? An Analysis Through Input Loss Landscape Tiejin Chen Wenwang Huang Linsey Pang Dongsheng Luo Hua Wei OOD 49 0 0 09 Mar 2024
SoK: Unintended Interactions among Machine Learning Defenses and Risks Vasisht Duddu S. Szyller Nadarajah Asokan AAML 47 2 0 07 Dec 2023
Beyond XAI:Obstacles Towards Responsible AI Yulu Pi 42 2 0 07 Sep 2023
Discriminative Feature Attributions: Bridging Post Hoc Explainability and Inherent Interpretability Usha Bhalla Suraj Srinivas Himabindu Lakkaraju FAtt CML 29 6 0 27 Jul 2023
Single-Class Target-Specific Attack against Interpretable Deep Learning Systems Eldor Abdukhamidov Mohammed Abuhamad George K. Thiruvathukal Hyoungshick Kim Tamer Abuhmed AAML 27 2 0 12 Jul 2023
Robust Ranking Explanations Chao Chen Chenghua Guo Guixiang Ma Ming Zeng Xi Zhang Sihong Xie FAtt AAML 35 0 0 08 Jul 2023
A Vulnerability of Attribution Methods Using Pre-Softmax Scores Miguel A. Lerma Mirtha Lucas FAtt 19 0 0 06 Jul 2023
SpArX: Sparse Argumentative Explanations for Neural Networks [Technical Report] Hamed Ayoobi Nico Potyka Francesca Toni 24 18 0 23 Jan 2023
MoreauGrad: Sparse and Robust Interpretation of Neural Networks via Moreau Envelope Jingwei Zhang Farzan Farnia UQCV 36 3 0 08 Jan 2023
Valid P-Value for Deep Learning-Driven Salient Region Daiki Miwa Vo Nguyen Le Duy I. Takeuchi FAtt AAML 32 14 0 06 Jan 2023
Robust Explanation Constraints for Neural Networks Matthew Wicker Juyeon Heo Luca Costabello Adrian Weller FAtt 29 18 0 16 Dec 2022
Identifying the Source of Vulnerability in Explanation Discrepancy: A Case Study in Neural Text Classification Ruixuan Tang Hanjie Chen Yangfeng Ji AAML FAtt 32 2 0 10 Dec 2022
Interpretation of Neural Networks is Susceptible to Universal Adversarial Perturbations Haniyeh Ehsani Oskouie Farzan Farnia FAtt AAML 22 5 0 30 Nov 2022
Towards More Robust Interpretation via Local Gradient Alignment Sunghwan Joo Seokhyeon Jeong Juyeon Heo Adrian Weller Taesup Moon FAtt 33 5 0 29 Nov 2022
What Makes a Good Explanation?: A Harmonized View of Properties of Explanations Zixi Chen Varshini Subhash Marton Havasi Weiwei Pan Finale Doshi-Velez XAI FAtt 39 18 0 10 Nov 2022
On the Robustness of Explanations of Deep Neural Network Models: A Survey Amlan Jyoti Karthik Balaji Ganesh Manoj Gayala Nandita Lakshmi Tunuguntla Sandesh Kamath V. Balasubramanian XAI FAtt AAML 32 4 0 09 Nov 2022
BOREx: Bayesian-Optimization--Based Refinement of Saliency Map for Image- and Video-Classification Models Atsushi Kikuchi Kotaro Uchida Masaki Waga Kohei Suenaga FAtt 26 1 0 31 Oct 2022
EMaP: Explainable AI with Manifold-based Perturbations Minh Nhat Vu Huy Mai My T. Thai AAML 35 2 0 18 Sep 2022
Inferring Sensitive Attributes from Model Explanations Vasisht Duddu A. Boutet MIACV SILM 24 16 0 21 Aug 2022
Shap-CAM: Visual Explanations for Convolutional Neural Networks based on Shapley Value Quan Zheng Ziwei Wang Jie Zhou Jiwen Lu FAtt 31 31 0 07 Aug 2022
Leveraging Explanations in Interactive Machine Learning: An Overview Stefano Teso Öznur Alkan Wolfgang Stammer Elizabeth M. Daly XAI FAtt LRM 26 62 0 29 Jul 2022
Equivariant and Invariant Grounding for Video Question Answering Yicong Li Xiang Wang Junbin Xiao Tat-Seng Chua 23 25 0 26 Jul 2022
Why we do need Explainable AI for Healthcare Giovanni Cina Tabea E. Rober Rob Goedhart Ilker Birbil 32 14 0 30 Jun 2022
Towards a Theory of Faithfulness: Faithful Explanations of Differentiable Classifiers over Continuous Data Nico Potyka Xiang Yin Francesca Toni FAtt 22 2 0 19 May 2022
Backdooring Explainable Machine Learning Maximilian Noppel Lukas Peter Christian Wressnegger AAML 16 5 0 20 Apr 2022
Anti-Adversarially Manipulated Attributions for Weakly Supervised Semantic Segmentation and Object Localization Jungbeom Lee Eunji Kim J. Mok Sung-Hoon Yoon WSOL 42 29 0 11 Apr 2022
Robustness and Usefulness in AI Explanation Methods Erick Galinkin FAtt 28 1 0 07 Mar 2022
Defense Against Explanation Manipulation Ruixiang Tang Ninghao Liu Fan Yang Na Zou Xia Hu AAML 46 11 0 08 Nov 2021
A Survey on the Robustness of Feature Importance and Counterfactual Explanations Saumitra Mishra Sanghamitra Dutta Jason Long Daniele Magazzeni AAML 16 58 0 30 Oct 2021
AdjointBackMapV2: Precise Reconstruction of Arbitrary CNN Unit's Activation via Adjoint Operators Qing Wan Siu Wun Cheung Yoonsuck Choe 32 0 0 04 Oct 2021
Explaining Bayesian Neural Networks Kirill Bykov Marina M.-C. Höhne Adelaida Creosteanu Klaus-Robert Muller Frederick Klauschen Shinichi Nakajima Marius Kloft BDL AAML 34 25 0 23 Aug 2021
Jointly Attacking Graph Neural Network and its Explanations Wenqi Fan Wei Jin Xiaorui Liu Han Xu Xianfeng Tang Suhang Wang Qing Li Jiliang Tang Jianping Wang Charu C. Aggarwal AAML 42 28 0 07 Aug 2021
Synthetic Benchmarks for Scientific Research in Explainable Machine Learning Yang Liu Sujay Khandagale Colin White Willie Neiswanger 37 65 0 23 Jun 2021
Characterizing the risk of fairwashing Ulrich Aïvodji Hiromi Arai Sébastien Gambs Satoshi Hara 23 27 0 14 Jun 2021
Anti-Adversarially Manipulated Attributions for Weakly and Semi-Supervised Semantic Segmentation Jungbeom Lee Eunji Kim Sungroh Yoon 30 226 0 16 Mar 2021
Do Input Gradients Highlight Discriminative Features? Harshay Shah Prateek Jain Praneeth Netrapalli AAML FAtt 26 57 0 25 Feb 2021
Towards Robust Explanations for Deep Neural Networks Ann-Kathrin Dombrowski Christopher J. Anders K. Müller Pan Kessel FAtt 35 63 0 18 Dec 2020
Visualizing Color-wise Saliency of Black-Box Image Classification Models Yuhki Hatakeyama Hiroki Sakuma Yoshinori Konishi Kohei Suenaga FAtt 22 3 0 06 Oct 2020
What Do You See? Evaluation of Explainable Artificial Intelligence (XAI) Interpretability through Neural Backdoors Yi-Shan Lin Wen-Chuan Lee Z. Berkay Celik XAI 29 93 0 22 Sep 2020
Model extraction from counterfactual explanations Ulrich Aïvodji Alexandre Bolot Sébastien Gambs MIACV MLAU 33 51 0 03 Sep 2020
Can We Trust Your Explanations? Sanity Checks for Interpreters in Android Malware Analysis Ming Fan Wenying Wei Xiaofei Xie Yang Liu X. Guan Ting Liu FAtt AAML 22 36 0 13 Aug 2020
Reliable Post hoc Explanations: Modeling Uncertainty in Explainability Dylan Slack Sophie Hilgard Sameer Singh Himabindu Lakkaraju FAtt 26 162 0 11 Aug 2020
A simple defense against adversarial attacks on heatmap explanations Laura Rieger Lars Kai Hansen FAtt AAML 33 37 0 13 Jul 2020