Understanding Neural Networks through Representation Erasure

24 December 2016

Jiwei Li

Dan Jurafsky

Papers citing "Understanding Neural Networks through Representation Erasure"

50 / 144 papers shown

Title
A Comprehensive Analysis of Adversarial Attacks against Spam Filters Esra Hotoğlu Sevil Sen Burcu Can AAML 31 0 0 04 May 2025
Interpreting the Linear Structure of Vision-language Model Embedding Spaces Isabel Papadimitriou Huangyuan Su Thomas Fel Naomi Saphra Sham Kakade Stephanie Gil VLM 56 0 0 16 Apr 2025
Selective Prompt Anchoring for Code Generation Yuan Tian Tianyi Zhang 102 3 0 24 Feb 2025
Automated Trustworthiness Oracle Generation for Machine Learning Text Classifiers Lam Nguyen Tung Steven Cho Xiaoning Du Neelofar Neelofar Valerio Terragni Stefano Ruberto Aldeida Aleti 236 2 0 30 Oct 2024
Counterfactuals As a Means for Evaluating Faithfulness of Attribution Methods in Autoregressive Language Models Sepehr Kamahi Yadollah Yaghoobzadeh 55 0 0 21 Aug 2024
On Behalf of the Stakeholders: Trends in NLP Model Interpretability in the Era of LLMs Nitay Calderon Roi Reichart 42 13 0 27 Jul 2024
Evaluating the Reliability of Self-Explanations in Large Language Models Korbinian Randl John Pavlopoulos Aron Henriksson Tony Lindgren LRM 52 0 0 19 Jul 2024
Benchmarking the Attribution Quality of Vision Models Robin Hesse Simone Schaub-Meyer Stefan Roth FAtt 39 3 0 16 Jul 2024
CAVE: Controllable Authorship Verification Explanations Sahana Ramnath Kartik Pandey Elizabeth Boschee Xiang Ren 66 2 0 24 Jun 2024
Explainability of machine learning approaches in forensic linguistics: a case study in geolinguistic authorship profiling Dana Roemling Yves Scherrer Aleksandra Miletic 61 0 0 29 Apr 2024
SWEA: Updating Factual Knowledge in Large Language Models via Subject Word Embedding Altering Xiaopeng Li Shasha Li Shezheng Song Huijun Liu Bing Ji ... Jun Ma Jie Yu Xiaodong Liu Jing Wang Weimin Zhang KELM 45 4 0 31 Jan 2024
Navigating the Structured What-If Spaces: Counterfactual Generation via Structured Diffusion Nishtha Madaan Srikanta J. Bedathur DiffM 38 0 0 21 Dec 2023
Quantifying Uncertainty in Natural Language Explanations of Large Language Models Sree Harsha Tanneru Chirag Agarwal Himabindu Lakkaraju LRM 32 14 0 06 Nov 2023
Interpreting Sentiment Composition with Latent Semantic Tree Zhongtao Jiang Yuanzhe Zhang Cao Liu Jiansong Chen Jun Zhao Kang Liu CoGe 31 0 0 31 Aug 2023
FunnyBirds: A Synthetic Vision Dataset for a Part-Based Analysis of Explainable AI Methods Robin Hesse Simone Schaub-Meyer Stefan Roth AAML 37 33 0 11 Aug 2023
Explaining Math Word Problem Solvers Abby Newcomb Jugal Kalita 18 1 0 24 Jul 2023
Explaining How Transformers Use Context to Build Predictions Javier Ferrando Gerard I. Gállego Ioannis Tsiamas Marta R. Costa-jussá 34 32 0 21 May 2023
Consistent Multi-Granular Rationale Extraction for Explainable Multi-hop Fact Verification Jiasheng Si Yingjie Zhu Deyu Zhou AAML 52 3 0 16 May 2023
Generating Post-hoc Explanations for Skip-gram-based Node Embeddings by Identifying Important Nodes with Bridgeness Hogun Park Jennifer Neville 14 4 0 24 Apr 2023
VISION DIFFMASK: Faithful Interpretation of Vision Transformers with Differentiable Patch Masking A. Nalmpantis Apostolos Panagiotopoulos John Gkountouras Konstantinos Papakostas Wilker Aziz 15 4 0 13 Apr 2023
Understanding and Detecting Hallucinations in Neural Machine Translation via Model Introspection Weijia Xu Sweta Agrawal Eleftheria Briakou Marianna J. Martindale Marine Carpuat HILM 27 47 0 18 Jan 2023
BMX: Boosting Natural Language Generation Metrics with Explainability Christoph Leiter Hoang-Quan Nguyen Steffen Eger ELM 24 0 0 20 Dec 2022
Identifying the Source of Vulnerability in Explanation Discrepancy: A Case Study in Neural Text Classification Ruixuan Tang Hanjie Chen Yangfeng Ji AAML FAtt 32 2 0 10 Dec 2022
AutoCAD: Automatically Generating Counterfactuals for Mitigating Shortcut Learning Jiaxin Wen Yeshuang Zhu Jinchao Zhang Jie Zhou Minlie Huang CML AAML 27 8 0 29 Nov 2022
Easy to Decide, Hard to Agree: Reducing Disagreements Between Saliency Methods Josip Jukić Martin Tutek Jan Snajder FAtt 31 0 0 15 Nov 2022
ViT-CX: Causal Explanation of Vision Transformers Weiyan Xie Xiao-hui Li Caleb Chen Cao Nevin L.Zhang ViT 37 17 0 06 Nov 2022
Unsupervised Text Deidentification John X. Morris Justin T. Chiu Ramin Zabih Alexander M. Rush 29 7 0 20 Oct 2022
On the Explainability of Natural Language Processing Deep Models Julia El Zini M. Awad 31 82 0 13 Oct 2022
AD-DROP: Attribution-Driven Dropout for Robust Language Model Fine-Tuning Tao Yang Jinghao Deng Xiaojun Quan Qifan Wang Shaoliang Nie 32 3 0 12 Oct 2022
U3E: Unsupervised and Erasure-based Evidence Extraction for Machine Reading Comprehension Suzhe He Shumin Shi Chenghao Wu 46 0 0 06 Oct 2022
Global Concept-Based Interpretability for Graph Neural Networks via Neuron Analysis Xuanyuan Han Pietro Barbiero Dobrik Georgiev Lucie Charlotte Magister Pietro Lio MILM 42 41 0 22 Aug 2022
A Novel Plug-and-Play Approach for Adversarially Robust Generalization Deepak Maurya Adarsh Barik Jean Honorio OOD AAML 46 0 0 19 Aug 2022
ferret: a Framework for Benchmarking Explainers on Transformers Giuseppe Attanasio Eliana Pastor C. Bonaventura Debora Nozza 33 30 0 02 Aug 2022
An Interpretability Evaluation Benchmark for Pre-trained Language Models Ya-Ming Shen Lijie Wang Ying-Cong Chen Xinyan Xiao Jing Liu Hua Wu 39 4 0 28 Jul 2022
Explainable Artificial Intelligence (XAI) for Internet of Things: A Survey İbrahim Kök Feyza Yıldırım Okay Özgecan Muyanlı S. Özdemir XAI 35 51 0 07 Jun 2022
Learning to Ignore Adversarial Attacks Yiming Zhang Yan Zhou Samuel Carton Chenhao Tan 59 2 0 23 May 2022
A Fine-grained Interpretability Evaluation Benchmark for Neural NLP Lijie Wang Yaozong Shen Shu-ping Peng Shuai Zhang Xinyan Xiao Hao Liu Hongxuan Tang Ying-Cong Chen Hua Wu Haifeng Wang ELM 19 21 0 23 May 2022
The Solvability of Interpretability Evaluation Metrics Yilun Zhou J. Shah 76 8 0 18 May 2022
It Takes Two Flints to Make a Fire: Multitask Learning of Neural Relation and Explanation Classifiers Zheng Tang Mihai Surdeanu 27 6 0 25 Apr 2022
How Pre-trained Language Models Capture Factual Knowledge? A Causal-Inspired Analysis Shaobo Li Xiaoguang Li Lifeng Shang Zhenhua Dong Chengjie Sun Bingquan Liu Zhenzhou Ji Xin Jiang Qun Liu KELM 34 53 0 31 Mar 2022
Controlling the Focus of Pretrained Language Generation Models Jiabao Ji Yoon Kim James R. Glass Tianxing He 38 5 0 02 Mar 2022
Interpreting Language Models with Contrastive Explanations Kayo Yin Graham Neubig MILM 23 78 0 21 Feb 2022
A Latent-Variable Model for Intrinsic Probing Karolina Stañczak Lucas Torroba Hennigen Adina Williams Ryan Cotterell Isabelle Augenstein 29 4 0 20 Jan 2022
UNIREX: A Unified Learning Framework for Language Model Rationale Extraction Aaron Chan Maziar Sanjabi Lambert Mathias L Tan Shaoliang Nie Xiaochang Peng Xiang Ren Hamed Firooz 43 42 0 16 Dec 2021
Quantifying and Understanding Adversarial Examples in Discrete Input Spaces Volodymyr Kuleshov Evgenii Nikishin S. Thakoor Tingfung Lau Stefano Ermon AAML 27 1 0 12 Dec 2021
MTV: Visual Analytics for Detecting, Investigating, and Annotating Anomalies in Multivariate Time Series Dongyu Liu Sarah Alnegheimish Alexandra Zytek K. Veeramachaneni AI4TS 27 20 0 10 Dec 2021
Scaling Up Influence Functions Andrea Schioppa Polina Zablotskaia David Vilar Artem Sokolov TDI 33 91 0 06 Dec 2021
Explainable Deep Learning in Healthcare: A Methodological Survey from an Attribution View Di Jin Elena Sergeeva W. Weng Geeticka Chauhan Peter Szolovits OOD 47 55 0 05 Dec 2021
Triggerless Backdoor Attack for NLP Tasks with Clean Labels Leilei Gan Jiwei Li Tianwei Zhang Xiaoya Li Yuxian Meng Fei Wu Yi Yang Shangwei Guo Chun Fan AAML SILM 27 74 0 15 Nov 2021
Counterfactual Explanations for Models of Code Jürgen Cito Işıl Dillig V. Murali S. Chandra AAML LRM 32 48 0 10 Nov 2021