ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2004.03685
  4. Cited By
Towards Faithfully Interpretable NLP Systems: How should we define and
  evaluate faithfulness?

Towards Faithfully Interpretable NLP Systems: How should we define and evaluate faithfulness?

7 April 2020
Alon Jacovi
Yoav Goldberg
    XAI
ArXivPDFHTML

Papers citing "Towards Faithfully Interpretable NLP Systems: How should we define and evaluate faithfulness?"

50 / 130 papers shown
Title
From Pixels to Perception: Interpretable Predictions via Instance-wise Grouped Feature Selection
From Pixels to Perception: Interpretable Predictions via Instance-wise Grouped Feature Selection
Moritz Vandenhirtz
Julia E. Vogt
38
0
0
09 May 2025
Reasoning Models Don't Always Say What They Think
Reasoning Models Don't Always Say What They Think
Yanda Chen
Joe Benton
Ansh Radhakrishnan
Jonathan Uesato
Carson E. Denison
...
Vlad Mikulik
Samuel R. Bowman
Jan Leike
Jared Kaplan
E. Perez
ReLM
LRM
67
12
1
08 May 2025
Adversarial Cooperative Rationalization: The Risk of Spurious Correlations in Even Clean Datasets
Adversarial Cooperative Rationalization: The Risk of Spurious Correlations in Even Clean Datasets
W. Liu
Zhongyu Niu
Lang Gao
Zhiying Deng
Jun Wang
H. Wang
Ruixuan Li
134
1
0
04 May 2025
PhysNav-DG: A Novel Adaptive Framework for Robust VLM-Sensor Fusion in Navigation Applications
PhysNav-DG: A Novel Adaptive Framework for Robust VLM-Sensor Fusion in Navigation Applications
Trisanth Srinivasan
Santosh Patapati
34
0
0
03 May 2025
Gender Bias in Explainability: Investigating Performance Disparity in Post-hoc Methods
Gender Bias in Explainability: Investigating Performance Disparity in Post-hoc Methods
Mahdi Dhaini
Ege Erdogan
Nils Feldhus
Gjergji Kasneci
46
0
0
02 May 2025
A constraints-based approach to fully interpretable neural networks for detecting learner behaviors
A constraints-based approach to fully interpretable neural networks for detecting learner behaviors
Juan D. Pinto
Luc Paquette
43
0
0
10 Apr 2025
Monitoring Reasoning Models for Misbehavior and the Risks of Promoting Obfuscation
Monitoring Reasoning Models for Misbehavior and the Risks of Promoting Obfuscation
Bowen Baker
Joost Huizinga
Leo Gao
Zehao Dou
M. Guan
Aleksander Mądry
Wojciech Zaremba
J. Pachocki
David Farhi
LRM
69
11
0
14 Mar 2025
A Unified Framework with Novel Metrics for Evaluating the Effectiveness of XAI Techniques in LLMs
A Unified Framework with Novel Metrics for Evaluating the Effectiveness of XAI Techniques in LLMs
Melkamu Mersha
Mesay Gemeda Yigezu
Hassan Shakil
Ali Al shami
SangHyun Byun
Jugal Kalita
62
0
0
06 Mar 2025
Beyond Translation: LLM-Based Data Generation for Multilingual Fact-Checking
Beyond Translation: LLM-Based Data Generation for Multilingual Fact-Checking
Yi-Ling Chung
Aurora Cobo
Pablo Serna
SyDa
HILM
60
0
0
24 Feb 2025
A Survey of Model Architectures in Information Retrieval
A Survey of Model Architectures in Information Retrieval
Zhichao Xu
Fengran Mo
Zhiqi Huang
Crystina Zhang
Puxuan Yu
Bei Wang
Jimmy J. Lin
Vivek Srikumar
KELM
3DV
56
2
0
21 Feb 2025
A Study of the Plausibility of Attention between RNN Encoders in Natural Language Inference
A Study of the Plausibility of Attention between RNN Encoders in Natural Language Inference
Duc Hau Nguyen
Duc Hau Nguyen
Pascale Sébillot
47
5
0
23 Jan 2025
Regularization, Semi-supervision, and Supervision for a Plausible Attention-Based Explanation
Regularization, Semi-supervision, and Supervision for a Plausible Attention-Based Explanation
Duc Hau Nguyen
Cyrielle Mallart
Guillaume Gravier
Pascale Sébillot
60
0
0
22 Jan 2025
A Tale of Two Imperatives: Privacy and Explainability
A Tale of Two Imperatives: Privacy and Explainability
Supriya Manna
Niladri Sett
94
0
0
30 Dec 2024
SPES: Spectrogram Perturbation for Explainable Speech-to-Text Generation
SPES: Spectrogram Perturbation for Explainable Speech-to-Text Generation
Dennis Fucci
Marco Gaido
Beatrice Savoldi
Matteo Negri
Mauro Cettolo
L. Bentivogli
54
1
0
03 Nov 2024
Do Robot Snakes Dream like Electric Sheep? Investigating the Effects of Architectural Inductive Biases on Hallucination
Do Robot Snakes Dream like Electric Sheep? Investigating the Effects of Architectural Inductive Biases on Hallucination
Jerry Huang
Prasanna Parthasarathi
Mehdi Rezagholizadeh
Boxing Chen
Sarath Chandar
50
0
0
22 Oct 2024
FLARE: Faithful Logic-Aided Reasoning and Exploration
FLARE: Faithful Logic-Aided Reasoning and Exploration
Erik Arakelyan
Pasquale Minervini
Pat Verga
Patrick Lewis
Isabelle Augenstein
ReLM
LRM
63
2
0
14 Oct 2024
F-Fidelity: A Robust Framework for Faithfulness Evaluation of Explainable AI
F-Fidelity: A Robust Framework for Faithfulness Evaluation of Explainable AI
Xu Zheng
Farhad Shirani
Zhuomin Chen
Chaohao Lin
Wei Cheng
Wenbo Guo
Dongsheng Luo
AAML
28
0
0
03 Oct 2024
COOL: Efficient and Reliable Chain-Oriented Objective Logic with Neural Networks Feedback Control for Program Synthesis
COOL: Efficient and Reliable Chain-Oriented Objective Logic with Neural Networks Feedback Control for Program Synthesis
Jipeng Han
36
0
0
02 Oct 2024
Explainable AI needs formal notions of explanation correctness
Explainable AI needs formal notions of explanation correctness
Stefan Haufe
Rick Wilming
Benedict Clark
Rustam Zhumagambetov
Danny Panknin
Ahcène Boubekki
XAI
31
1
0
22 Sep 2024
Prompts Are Programs Too! Understanding How Developers Build Software Containing Prompts
Prompts Are Programs Too! Understanding How Developers Build Software Containing Prompts
Jenny T Liang
Melissa Lin
Nikitha Rao
Brad A. Myers
75
5
0
19 Sep 2024
Counterfactuals As a Means for Evaluating Faithfulness of Attribution Methods in Autoregressive Language Models
Counterfactuals As a Means for Evaluating Faithfulness of Attribution Methods in Autoregressive Language Models
Sepehr Kamahi
Yadollah Yaghoobzadeh
48
0
0
21 Aug 2024
Visual Agents as Fast and Slow Thinkers
Visual Agents as Fast and Slow Thinkers
Guangyan Sun
Mingyu Jin
Zhenting Wang
Cheng-Long Wang
Siqi Ma
Qifan Wang
Ying Nian Wu
Ying Nian Wu
Dongfang Liu
Dongfang Liu
LLMAG
LRM
77
13
0
16 Aug 2024
Data Debugging is NP-hard for Classifiers Trained with SGD
Data Debugging is NP-hard for Classifiers Trained with SGD
Zizheng Guo
Pengyu Chen
Yanzhang Fu
Xuelong Li
28
0
0
02 Aug 2024
Faithful and Plausible Natural Language Explanations for Image Classification: A Pipeline Approach
Faithful and Plausible Natural Language Explanations for Image Classification: A Pipeline Approach
Adam Wojciechowski
Mateusz Lango
Ondrej Dusek
FAtt
41
0
0
30 Jul 2024
On Behalf of the Stakeholders: Trends in NLP Model Interpretability in the Era of LLMs
On Behalf of the Stakeholders: Trends in NLP Model Interpretability in the Era of LLMs
Nitay Calderon
Roi Reichart
38
10
0
27 Jul 2024
Exploring the Plausibility of Hate and Counter Speech Detectors with
  Explainable AI
Exploring the Plausibility of Hate and Counter Speech Detectors with Explainable AI
Adrian Jaques Böck
D. Slijepcevic
Matthias Zeppelzauer
44
0
0
25 Jul 2024
Transformer Circuit Faithfulness Metrics are not Robust
Transformer Circuit Faithfulness Metrics are not Robust
Joseph Miller
Bilal Chughtai
William Saunders
50
7
0
11 Jul 2024
A Practical Review of Mechanistic Interpretability for Transformer-Based Language Models
A Practical Review of Mechanistic Interpretability for Transformer-Based Language Models
Daking Rai
Yilun Zhou
Shi Feng
Abulhair Saparov
Ziyu Yao
82
19
0
02 Jul 2024
Evaluating Human Alignment and Model Faithfulness of LLM Rationale
Evaluating Human Alignment and Model Faithfulness of LLM Rationale
Mohsen Fayyaz
Fan Yin
Jiao Sun
Nanyun Peng
55
3
0
28 Jun 2024
Do Vision & Language Decoders use Images and Text equally? How Self-consistent are their Explanations?
Do Vision & Language Decoders use Images and Text equally? How Self-consistent are their Explanations?
Letitia Parcalabescu
Anette Frank
MLLM
CoGe
VLM
82
3
0
29 Apr 2024
RankingSHAP -- Listwise Feature Attribution Explanations for Ranking Models
RankingSHAP -- Listwise Feature Attribution Explanations for Ranking Models
Maria Heuss
Maarten de Rijke
Avishek Anand
98
1
0
24 Mar 2024
Best of Both Worlds: A Pliable and Generalizable Neuro-Symbolic Approach
  for Relation Classification
Best of Both Worlds: A Pliable and Generalizable Neuro-Symbolic Approach for Relation Classification
Robert Vacareanu
F. Alam
M. Islam
Haris Riaz
Mihai Surdeanu
NAI
27
2
0
05 Mar 2024
B-Cos Aligned Transformers Learn Human-Interpretable Features
B-Cos Aligned Transformers Learn Human-Interpretable Features
Manuel Tran
Amal Lahiani
Yashin Dicente Cid
Melanie Boxberg
Peter Lienemann
C. Matek
S. J. Wagner
Fabian J. Theis
Eldad Klaiman
Tingying Peng
MedIm
ViT
13
2
0
16 Jan 2024
Evaluating Language Model Agency through Negotiations
Evaluating Language Model Agency through Negotiations
Tim R. Davidson
V. Veselovsky
Martin Josifoski
Maxime Peyrard
Antoine Bosselut
Michal Kosinski
Robert West
LLMAG
31
22
0
09 Jan 2024
ALMANACS: A Simulatability Benchmark for Language Model Explainability
ALMANACS: A Simulatability Benchmark for Language Model Explainability
Edmund Mills
Shiye Su
Stuart J. Russell
Scott Emmons
48
7
0
20 Dec 2023
The Problem of Coherence in Natural Language Explanations of Recommendations
The Problem of Coherence in Natural Language Explanations of Recommendations
Jakub Raczynski
Mateusz Lango
Jerzy Stefanowski
30
6
0
18 Dec 2023
A Glitch in the Matrix? Locating and Detecting Language Model Grounding with Fakepedia
A Glitch in the Matrix? Locating and Detecting Language Model Grounding with Fakepedia
Giovanni Monea
Maxime Peyrard
Martin Josifoski
Vishrav Chaudhary
Jason Eisner
Emre Kiciman
Hamid Palangi
Barun Patra
Robert West
KELM
51
12
0
04 Dec 2023
Improving Interpretation Faithfulness for Vision Transformers
Improving Interpretation Faithfulness for Vision Transformers
Lijie Hu
Yixin Liu
Ninghao Liu
Mengdi Huai
Lichao Sun
Di Wang
27
5
0
29 Nov 2023
Interpreting and Exploiting Functional Specialization in Multi-Head
  Attention under Multi-task Learning
Interpreting and Exploiting Functional Specialization in Multi-Head Attention under Multi-task Learning
Chong Li
Shaonan Wang
Yunhao Zhang
Jiajun Zhang
Chengqing Zong
27
4
0
16 Oct 2023
Evaluating Explanation Methods for Vision-and-Language Navigation
Evaluating Explanation Methods for Vision-and-Language Navigation
Guanqi Chen
Lei Yang
Guanhua Chen
Jia Pan
XAI
23
0
0
10 Oct 2023
DecoderLens: Layerwise Interpretation of Encoder-Decoder Transformers
DecoderLens: Layerwise Interpretation of Encoder-Decoder Transformers
Anna Langedijk
Hosein Mohebbi
Gabriele Sarti
Willem H. Zuidema
Jaap Jumelet
21
10
0
05 Oct 2023
Situated Natural Language Explanations
Situated Natural Language Explanations
Zining Zhu
Hao Jiang
Jingfeng Yang
Sreyashi Nag
Chao Zhang
Jie Huang
Yifan Gao
Frank Rudzicz
Bing Yin
LRM
38
1
0
27 Aug 2023
Generative Models as a Complex Systems Science: How can we make sense of
  large language model behavior?
Generative Models as a Complex Systems Science: How can we make sense of large language model behavior?
Ari Holtzman
Peter West
Luke Zettlemoyer
AI4CE
30
14
0
31 Jul 2023
Do Models Explain Themselves? Counterfactual Simulatability of Natural
  Language Explanations
Do Models Explain Themselves? Counterfactual Simulatability of Natural Language Explanations
Yanda Chen
Ruiqi Zhong
Narutatsu Ri
Chen Zhao
He He
Jacob Steinhardt
Zhou Yu
Kathleen McKeown
LRM
26
47
0
17 Jul 2023
DARE: Towards Robust Text Explanations in Biomedical and Healthcare
  Applications
DARE: Towards Robust Text Explanations in Biomedical and Healthcare Applications
Adam Ivankay
Mattia Rigotti
P. Frossard
OOD
MedIm
21
1
0
05 Jul 2023
AI Transparency in the Age of LLMs: A Human-Centered Research Roadmap
AI Transparency in the Age of LLMs: A Human-Centered Research Roadmap
Q. V. Liao
J. Vaughan
36
158
0
02 Jun 2023
MaNtLE: Model-agnostic Natural Language Explainer
MaNtLE: Model-agnostic Natural Language Explainer
Rakesh R Menon
Kerem Zaman
Shashank Srivastava
FAtt
LRM
16
2
0
22 May 2023
VISION DIFFMASK: Faithful Interpretation of Vision Transformers with
  Differentiable Patch Masking
VISION DIFFMASK: Faithful Interpretation of Vision Transformers with Differentiable Patch Masking
A. Nalmpantis
Apostolos Panagiotopoulos
John Gkountouras
Konstantinos Papakostas
Wilker Aziz
15
4
0
13 Apr 2023
Faithful Chain-of-Thought Reasoning
Faithful Chain-of-Thought Reasoning
Qing Lyu
Shreya Havaldar
Adam Stein
Li Zhang
D. Rao
Eric Wong
Marianna Apidianaki
Chris Callison-Burch
ReLM
LRM
26
207
0
31 Jan 2023
The State of Human-centered NLP Technology for Fact-checking
The State of Human-centered NLP Technology for Fact-checking
Anubrata Das
Houjiang Liu
Venelin Kovatchev
Matthew Lease
HILM
19
61
0
08 Jan 2023
123
Next