Explaining Language Models' Predictions with High-Impact Concepts

3 May 2023

Papers citing "Explaining Language Models' Predictions with High-Impact Concepts"

5 / 5 papers shown

Title
Probing Classifiers: Promises, Shortcomings, and Advances Yonatan Belinkov 181 291 0 24 Feb 2021
On Completeness-aware Concept-Based Explanations in Deep Neural Networks Chih-Kuan Yeh Been Kim Sercan Ö. Arik Chun-Liang Li Tomas Pfister Pradeep Ravikumar FAtt 105 250 0 17 Oct 2019
What you can cram into a single vector: Probing sentence embeddings for linguistic properties Alexis Conneau Germán Kruszewski Guillaume Lample Loïc Barrault Marco Baroni 181 824 0 03 May 2018
A causal framework for explaining the predictions of black-box sequence-to-sequence models David Alvarez-Melis Tommi Jaakkola CML 194 189 0 06 Jul 2017
Towards A Rigorous Science of Interpretable Machine Learning Finale Doshi-Velez Been Kim XAI FaML 205 2,098 0 28 Feb 2017