ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1612.08220
  4. Cited By
Understanding Neural Networks through Representation Erasure

Understanding Neural Networks through Representation Erasure

24 December 2016
Jiwei Li
Will Monroe
Dan Jurafsky
    AAML
    MILM
ArXivPDFHTML

Papers citing "Understanding Neural Networks through Representation Erasure"

50 / 144 papers shown
Title
A Comprehensive Analysis of Adversarial Attacks against Spam Filters
A Comprehensive Analysis of Adversarial Attacks against Spam Filters
Esra Hotoğlu
Sevil Sen
Burcu Can
AAML
31
0
0
04 May 2025
Interpreting the Linear Structure of Vision-language Model Embedding Spaces
Interpreting the Linear Structure of Vision-language Model Embedding Spaces
Isabel Papadimitriou
Huangyuan Su
Thomas Fel
Naomi Saphra
Sham Kakade
Stephanie Gil
VLM
56
0
0
16 Apr 2025
Selective Prompt Anchoring for Code Generation
Selective Prompt Anchoring for Code Generation
Yuan Tian
Tianyi Zhang
102
3
0
24 Feb 2025
Automated Trustworthiness Oracle Generation for Machine Learning Text Classifiers
Automated Trustworthiness Oracle Generation for Machine Learning Text Classifiers
Lam Nguyen Tung
Steven Cho
Xiaoning Du
Neelofar Neelofar
Valerio Terragni
Stefano Ruberto
Aldeida Aleti
236
2
0
30 Oct 2024
Counterfactuals As a Means for Evaluating Faithfulness of Attribution Methods in Autoregressive Language Models
Counterfactuals As a Means for Evaluating Faithfulness of Attribution Methods in Autoregressive Language Models
Sepehr Kamahi
Yadollah Yaghoobzadeh
55
0
0
21 Aug 2024
On Behalf of the Stakeholders: Trends in NLP Model Interpretability in the Era of LLMs
On Behalf of the Stakeholders: Trends in NLP Model Interpretability in the Era of LLMs
Nitay Calderon
Roi Reichart
42
13
0
27 Jul 2024
Evaluating the Reliability of Self-Explanations in Large Language Models
Evaluating the Reliability of Self-Explanations in Large Language Models
Korbinian Randl
John Pavlopoulos
Aron Henriksson
Tony Lindgren
LRM
52
0
0
19 Jul 2024
Benchmarking the Attribution Quality of Vision Models
Benchmarking the Attribution Quality of Vision Models
Robin Hesse
Simone Schaub-Meyer
Stefan Roth
FAtt
39
3
0
16 Jul 2024
CAVE: Controllable Authorship Verification Explanations
CAVE: Controllable Authorship Verification Explanations
Sahana Ramnath
Kartik Pandey
Elizabeth Boschee
Xiang Ren
66
2
0
24 Jun 2024
Explainability of machine learning approaches in forensic linguistics: a
  case study in geolinguistic authorship profiling
Explainability of machine learning approaches in forensic linguistics: a case study in geolinguistic authorship profiling
Dana Roemling
Yves Scherrer
Aleksandra Miletic
61
0
0
29 Apr 2024
SWEA: Updating Factual Knowledge in Large Language Models via Subject Word Embedding Altering
SWEA: Updating Factual Knowledge in Large Language Models via Subject Word Embedding Altering
Xiaopeng Li
Shasha Li
Shezheng Song
Huijun Liu
Bing Ji
...
Jun Ma
Jie Yu
Xiaodong Liu
Jing Wang
Weimin Zhang
KELM
45
4
0
31 Jan 2024
Navigating the Structured What-If Spaces: Counterfactual Generation via
  Structured Diffusion
Navigating the Structured What-If Spaces: Counterfactual Generation via Structured Diffusion
Nishtha Madaan
Srikanta J. Bedathur
DiffM
38
0
0
21 Dec 2023
Quantifying Uncertainty in Natural Language Explanations of Large
  Language Models
Quantifying Uncertainty in Natural Language Explanations of Large Language Models
Sree Harsha Tanneru
Chirag Agarwal
Himabindu Lakkaraju
LRM
32
14
0
06 Nov 2023
Interpreting Sentiment Composition with Latent Semantic Tree
Interpreting Sentiment Composition with Latent Semantic Tree
Zhongtao Jiang
Yuanzhe Zhang
Cao Liu
Jiansong Chen
Jun Zhao
Kang Liu
CoGe
31
0
0
31 Aug 2023
FunnyBirds: A Synthetic Vision Dataset for a Part-Based Analysis of
  Explainable AI Methods
FunnyBirds: A Synthetic Vision Dataset for a Part-Based Analysis of Explainable AI Methods
Robin Hesse
Simone Schaub-Meyer
Stefan Roth
AAML
37
33
0
11 Aug 2023
Explaining Math Word Problem Solvers
Explaining Math Word Problem Solvers
Abby Newcomb
Jugal Kalita
18
1
0
24 Jul 2023
Explaining How Transformers Use Context to Build Predictions
Explaining How Transformers Use Context to Build Predictions
Javier Ferrando
Gerard I. Gállego
Ioannis Tsiamas
Marta R. Costa-jussá
34
32
0
21 May 2023
Consistent Multi-Granular Rationale Extraction for Explainable Multi-hop
  Fact Verification
Consistent Multi-Granular Rationale Extraction for Explainable Multi-hop Fact Verification
Jiasheng Si
Yingjie Zhu
Deyu Zhou
AAML
52
3
0
16 May 2023
Generating Post-hoc Explanations for Skip-gram-based Node Embeddings by
  Identifying Important Nodes with Bridgeness
Generating Post-hoc Explanations for Skip-gram-based Node Embeddings by Identifying Important Nodes with Bridgeness
Hogun Park
Jennifer Neville
14
4
0
24 Apr 2023
VISION DIFFMASK: Faithful Interpretation of Vision Transformers with
  Differentiable Patch Masking
VISION DIFFMASK: Faithful Interpretation of Vision Transformers with Differentiable Patch Masking
A. Nalmpantis
Apostolos Panagiotopoulos
John Gkountouras
Konstantinos Papakostas
Wilker Aziz
15
4
0
13 Apr 2023
Understanding and Detecting Hallucinations in Neural Machine Translation
  via Model Introspection
Understanding and Detecting Hallucinations in Neural Machine Translation via Model Introspection
Weijia Xu
Sweta Agrawal
Eleftheria Briakou
Marianna J. Martindale
Marine Carpuat
HILM
27
47
0
18 Jan 2023
BMX: Boosting Natural Language Generation Metrics with Explainability
BMX: Boosting Natural Language Generation Metrics with Explainability
Christoph Leiter
Hoang-Quan Nguyen
Steffen Eger
ELM
24
0
0
20 Dec 2022
Identifying the Source of Vulnerability in Explanation Discrepancy: A
  Case Study in Neural Text Classification
Identifying the Source of Vulnerability in Explanation Discrepancy: A Case Study in Neural Text Classification
Ruixuan Tang
Hanjie Chen
Yangfeng Ji
AAML
FAtt
32
2
0
10 Dec 2022
AutoCAD: Automatically Generating Counterfactuals for Mitigating
  Shortcut Learning
AutoCAD: Automatically Generating Counterfactuals for Mitigating Shortcut Learning
Jiaxin Wen
Yeshuang Zhu
Jinchao Zhang
Jie Zhou
Minlie Huang
CML
AAML
27
8
0
29 Nov 2022
Easy to Decide, Hard to Agree: Reducing Disagreements Between Saliency
  Methods
Easy to Decide, Hard to Agree: Reducing Disagreements Between Saliency Methods
Josip Jukić
Martin Tutek
Jan Snajder
FAtt
31
0
0
15 Nov 2022
ViT-CX: Causal Explanation of Vision Transformers
ViT-CX: Causal Explanation of Vision Transformers
Weiyan Xie
Xiao-hui Li
Caleb Chen Cao
Nevin L.Zhang
ViT
37
17
0
06 Nov 2022
Unsupervised Text Deidentification
Unsupervised Text Deidentification
John X. Morris
Justin T. Chiu
Ramin Zabih
Alexander M. Rush
29
7
0
20 Oct 2022
On the Explainability of Natural Language Processing Deep Models
On the Explainability of Natural Language Processing Deep Models
Julia El Zini
M. Awad
31
82
0
13 Oct 2022
AD-DROP: Attribution-Driven Dropout for Robust Language Model
  Fine-Tuning
AD-DROP: Attribution-Driven Dropout for Robust Language Model Fine-Tuning
Tao Yang
Jinghao Deng
Xiaojun Quan
Qifan Wang
Shaoliang Nie
32
3
0
12 Oct 2022
U3E: Unsupervised and Erasure-based Evidence Extraction for Machine
  Reading Comprehension
U3E: Unsupervised and Erasure-based Evidence Extraction for Machine Reading Comprehension
Suzhe He
Shumin Shi
Chenghao Wu
46
0
0
06 Oct 2022
Global Concept-Based Interpretability for Graph Neural Networks via
  Neuron Analysis
Global Concept-Based Interpretability for Graph Neural Networks via Neuron Analysis
Xuanyuan Han
Pietro Barbiero
Dobrik Georgiev
Lucie Charlotte Magister
Pietro Lio
MILM
42
41
0
22 Aug 2022
A Novel Plug-and-Play Approach for Adversarially Robust Generalization
A Novel Plug-and-Play Approach for Adversarially Robust Generalization
Deepak Maurya
Adarsh Barik
Jean Honorio
OOD
AAML
46
0
0
19 Aug 2022
ferret: a Framework for Benchmarking Explainers on Transformers
ferret: a Framework for Benchmarking Explainers on Transformers
Giuseppe Attanasio
Eliana Pastor
C. Bonaventura
Debora Nozza
33
30
0
02 Aug 2022
An Interpretability Evaluation Benchmark for Pre-trained Language Models
An Interpretability Evaluation Benchmark for Pre-trained Language Models
Ya-Ming Shen
Lijie Wang
Ying-Cong Chen
Xinyan Xiao
Jing Liu
Hua Wu
39
4
0
28 Jul 2022
Explainable Artificial Intelligence (XAI) for Internet of Things: A
  Survey
Explainable Artificial Intelligence (XAI) for Internet of Things: A Survey
İbrahim Kök
Feyza Yıldırım Okay
Özgecan Muyanlı
S. Özdemir
XAI
35
51
0
07 Jun 2022
Learning to Ignore Adversarial Attacks
Learning to Ignore Adversarial Attacks
Yiming Zhang
Yan Zhou
Samuel Carton
Chenhao Tan
59
2
0
23 May 2022
A Fine-grained Interpretability Evaluation Benchmark for Neural NLP
A Fine-grained Interpretability Evaluation Benchmark for Neural NLP
Lijie Wang
Yaozong Shen
Shu-ping Peng
Shuai Zhang
Xinyan Xiao
Hao Liu
Hongxuan Tang
Ying-Cong Chen
Hua Wu
Haifeng Wang
ELM
19
21
0
23 May 2022
The Solvability of Interpretability Evaluation Metrics
The Solvability of Interpretability Evaluation Metrics
Yilun Zhou
J. Shah
76
8
0
18 May 2022
It Takes Two Flints to Make a Fire: Multitask Learning of Neural
  Relation and Explanation Classifiers
It Takes Two Flints to Make a Fire: Multitask Learning of Neural Relation and Explanation Classifiers
Zheng Tang
Mihai Surdeanu
27
6
0
25 Apr 2022
How Pre-trained Language Models Capture Factual Knowledge? A
  Causal-Inspired Analysis
How Pre-trained Language Models Capture Factual Knowledge? A Causal-Inspired Analysis
Shaobo Li
Xiaoguang Li
Lifeng Shang
Zhenhua Dong
Chengjie Sun
Bingquan Liu
Zhenzhou Ji
Xin Jiang
Qun Liu
KELM
34
53
0
31 Mar 2022
Controlling the Focus of Pretrained Language Generation Models
Controlling the Focus of Pretrained Language Generation Models
Jiabao Ji
Yoon Kim
James R. Glass
Tianxing He
38
5
0
02 Mar 2022
Interpreting Language Models with Contrastive Explanations
Interpreting Language Models with Contrastive Explanations
Kayo Yin
Graham Neubig
MILM
23
78
0
21 Feb 2022
A Latent-Variable Model for Intrinsic Probing
A Latent-Variable Model for Intrinsic Probing
Karolina Stañczak
Lucas Torroba Hennigen
Adina Williams
Ryan Cotterell
Isabelle Augenstein
29
4
0
20 Jan 2022
UNIREX: A Unified Learning Framework for Language Model Rationale
  Extraction
UNIREX: A Unified Learning Framework for Language Model Rationale Extraction
Aaron Chan
Maziar Sanjabi
Lambert Mathias
L Tan
Shaoliang Nie
Xiaochang Peng
Xiang Ren
Hamed Firooz
43
42
0
16 Dec 2021
Quantifying and Understanding Adversarial Examples in Discrete Input
  Spaces
Quantifying and Understanding Adversarial Examples in Discrete Input Spaces
Volodymyr Kuleshov
Evgenii Nikishin
S. Thakoor
Tingfung Lau
Stefano Ermon
AAML
27
1
0
12 Dec 2021
MTV: Visual Analytics for Detecting, Investigating, and Annotating
  Anomalies in Multivariate Time Series
MTV: Visual Analytics for Detecting, Investigating, and Annotating Anomalies in Multivariate Time Series
Dongyu Liu
Sarah Alnegheimish
Alexandra Zytek
K. Veeramachaneni
AI4TS
27
20
0
10 Dec 2021
Scaling Up Influence Functions
Scaling Up Influence Functions
Andrea Schioppa
Polina Zablotskaia
David Vilar
Artem Sokolov
TDI
33
91
0
06 Dec 2021
Explainable Deep Learning in Healthcare: A Methodological Survey from an
  Attribution View
Explainable Deep Learning in Healthcare: A Methodological Survey from an Attribution View
Di Jin
Elena Sergeeva
W. Weng
Geeticka Chauhan
Peter Szolovits
OOD
47
55
0
05 Dec 2021
Triggerless Backdoor Attack for NLP Tasks with Clean Labels
Triggerless Backdoor Attack for NLP Tasks with Clean Labels
Leilei Gan
Jiwei Li
Tianwei Zhang
Xiaoya Li
Yuxian Meng
Fei Wu
Yi Yang
Shangwei Guo
Chun Fan
AAML
SILM
27
74
0
15 Nov 2021
Counterfactual Explanations for Models of Code
Counterfactual Explanations for Models of Code
Jürgen Cito
Işıl Dillig
V. Murali
S. Chandra
AAML
LRM
32
48
0
10 Nov 2021
123
Next