Interpretability Guarantees with Merlin-Arthur Classifiers

1 June 2022

Papers citing "Interpretability Guarantees with Merlin-Arthur Classifiers"

5 / 5 papers shown

Title
Training Characteristic Functions with Reinforcement Learning: XAI-methods play Connect Four S. Wäldchen Felix Huber Sebastian Pokutta FAtt 28 8 0 23 Feb 2022
Interpretable Neural Networks with Frank-Wolfe: Sparse Relevance Maps and Relevance Orderings Jan Macdonald Mathieu Besançon Sebastian Pokutta 32 11 0 15 Oct 2021
Invariant Rationalization Shiyu Chang Yang Zhang Mo Yu Tommi Jaakkola 179 201 0 22 Mar 2020
A Survey on Bias and Fairness in Machine Learning Ninareh Mehrabi Fred Morstatter N. Saxena Kristina Lerman Aram Galstyan SyDa FaML 323 4,212 0 23 Aug 2019
AI safety via debate G. Irving Paul Christiano Dario Amodei 204 200 0 02 May 2018