Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
2006.14032
Cited By
v1
v2 (latest)
Compositional Explanations of Neurons
Neural Information Processing Systems (NeurIPS), 2020
24 June 2020
Jesse Mu
Jacob Andreas
FAtt
CoGe
MILM
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Compositional Explanations of Neurons"
46 / 146 papers shown
Diagnostics for Deep Neural Networks with Automated Copy/Paste Attacks
Stephen Casper
K. Hariharan
Dylan Hadfield-Menell
AAML
416
11
0
18 Nov 2022
Finding Skill Neurons in Pre-trained Transformer-based Language Models
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2022
Xiaozhi Wang
Kaiyue Wen
Zhengyan Zhang
Lei Hou
Zhiyuan Liu
Juanzi Li
MILM
MoE
197
60
0
14 Nov 2022
New Definitions and Evaluations for Saliency Methods: Staying Intrinsic, Complete and Sound
Neural Information Processing Systems (NeurIPS), 2022
Arushi Gupta
Nikunj Saunshi
Dingli Yu
Kaifeng Lyu
Sanjeev Arora
AAML
FAtt
XAI
139
8
0
05 Nov 2022
Interpretability in the Wild: a Circuit for Indirect Object Identification in GPT-2 small
International Conference on Learning Representations (ICLR), 2022
Kevin Wang
Alexandre Variengien
Arthur Conmy
Buck Shlegeris
Jacob Steinhardt
628
803
0
01 Nov 2022
Post-hoc analysis of Arabic transformer models
BlackboxNLP Workshop on Analyzing and Interpreting Neural Networks for NLP (BlackboxNLP), 2022
Ahmed Abdelali
Nadir Durrani
Fahim Dalvi
Hassan Sajjad
134
1
0
18 Oct 2022
Global Concept-Based Interpretability for Graph Neural Networks via Neuron Analysis
AAAI Conference on Artificial Intelligence (AAAI), 2022
Xuanyuan Han
Pietro Barbiero
Dobrik Georgiev
Lucie Charlotte Magister
Pietro Lio
MILM
259
57
0
22 Aug 2022
Toward Transparent AI: A Survey on Interpreting the Inner Structures of Deep Neural Networks
Tilman Raukur
A. Ho
Stephen Casper
Dylan Hadfield-Menell
AAML
AI4CE
787
170
0
27 Jul 2022
Interpretable by Design: Learning Predictors by Composing Interpretable Queries
IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2022
Aditya Chattopadhyay
Stewart Slocum
B. Haeffele
René Vidal
D. Geman
257
30
0
03 Jul 2022
Analyzing Encoded Concepts in Transformer Language Models
North American Chapter of the Association for Computational Linguistics (NAACL), 2022
Hassan Sajjad
Nadir Durrani
Fahim Dalvi
Firoj Alam
A. Khan
Jia Xu
187
54
0
27 Jun 2022
Discovering Salient Neurons in Deep NLP Models
Journal of machine learning research (JMLR), 2022
Nadir Durrani
Fahim Dalvi
Hassan Sajjad
KELM
MILM
307
20
0
27 Jun 2022
Coupling Visual Semantics of Artificial Neural Networks and Human Brain Function via Synchronized Activations
IEEE Transactions on Cognitive and Developmental Systems (IEEE TCDS), 2022
Lin Zhao
Haixing Dai
Zihao Wu
Zhe Xiao
Lu Zhang
...
Xiaoyan Cai
Xi Jiang
Sheng Li
Dajiang Zhu
Tianming Liu
150
9
0
22 Jun 2022
DORA: Exploring Outlier Representations in Deep Neural Networks
Kirill Bykov
Mayukh Deb
Dennis Grinwald
Klaus-Robert Muller
Marina M.-C. Höhne
446
16
0
09 Jun 2022
Pruning for Feature-Preserving Circuits in CNNs
Christopher Hamblin
Talia Konkle
G. Alvarez
326
2
0
03 Jun 2022
CLIP-Dissect: Automatic Description of Neuron Representations in Deep Vision Networks
International Conference on Learning Representations (ICLR), 2022
Tuomas P. Oikarinen
Tsui-Wei Weng
VLM
392
127
1
23 Apr 2022
Learning to Scaffold: Optimizing Model Explanations for Teaching
Neural Information Processing Systems (NeurIPS), 2022
Patrick Fernandes
Marcos Vinícius Treviso
Danish Pruthi
André F. T. Martins
Graham Neubig
FAtt
288
24
0
22 Apr 2022
HINT: Hierarchical Neuron Concept Explainer
Computer Vision and Pattern Recognition (CVPR), 2022
Andong Wang
Wei-Ning Lee
Xiaojuan Qi
195
22
0
27 Mar 2022
Towards Explainable Evaluation Metrics for Natural Language Generation
Christoph Leiter
Piyawat Lertvittayakumjorn
M. Fomicheva
Wei Zhao
Yang Gao
Steffen Eger
AAML
ELM
245
21
0
21 Mar 2022
Natural Language Descriptions of Deep Visual Features
International Conference on Learning Representations (ICLR), 2022
Evan Hernandez
Sarah Schwettmann
David Bau
Teona Bagashvili
Antonio Torralba
Jacob Andreas
MILM
994
150
0
26 Jan 2022
From Anecdotal Evidence to Quantitative Evaluation Methods: A Systematic Review on Evaluating Explainable AI
ACM Computing Surveys (ACM CSUR), 2022
Meike Nauta
Jan Trienes
Shreyasi Pathak
Elisa Nguyen
Michelle Peters
Yasmin Schmitt
Jorg Schlotterer
M. V. Keulen
C. Seifert
ELM
XAI
619
577
0
20 Jan 2022
A Latent-Variable Model for Intrinsic Probing
AAAI Conference on Artificial Intelligence (AAAI), 2022
Karolina Stañczak
Lucas Torroba Hennigen
Adina Williams
Robert Bamler
Isabelle Augenstein
410
6
0
20 Jan 2022
Interpreting Arabic Transformer Models
Ahmed Abdelali
Nadir Durrani
Fahim Dalvi
Hassan Sajjad
152
2
0
19 Jan 2022
Forward Composition Propagation for Explainable Neural Reasoning
IEEE Computational Intelligence Magazine (IEEE CIM), 2021
Isel Grau
Gonzalo Nápoles
M. Bello
Yamisleydi Salgueiro
A. Jastrzębska
172
2
0
23 Dec 2021
Can Explanations Be Useful for Calibrating Black Box Models?
Xi Ye
Greg Durrett
FAtt
247
29
0
14 Oct 2021
Quantifying Local Specialization in Deep Neural Networks
Shlomi Hod
Daniel Filan
Stephen Casper
Andrew Critch
Stuart J. Russell
243
12
0
13 Oct 2021
Robust Feature-Level Adversaries are Interpretability Tools
Stephen Casper
Max Nadeau
Dylan Hadfield-Menell
Gabriel Kreiman
AAML
702
33
0
07 Oct 2021
Detection Accuracy for Evaluating Compositional Explanations of Units
Sayo M. Makinwa
Biagio La Rosa
Roberto Capobianco
FAtt
CoGe
263
3
0
16 Sep 2021
A Bayesian Framework for Information-Theoretic Probing
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2021
Tiago Pimentel
Robert Bamler
230
25
0
08 Sep 2021
Neuron-level Interpretation of Deep NLP Models: A Survey
Transactions of the Association for Computational Linguistics (TACL), 2021
Hassan Sajjad
Nadir Durrani
Fahim Dalvi
MILM
AI4CE
321
97
0
30 Aug 2021
Explaining Bayesian Neural Networks
Kirill Bykov
Marina M.-C. Höhne
Adelaida Creosteanu
Klaus-Robert Muller
Frederick Klauschen
Shinichi Nakajima
Matthias Kirchler
BDL
AAML
428
30
0
23 Aug 2021
Post-hoc Interpretability for Neural NLP: A Survey
ACM Computing Surveys (CSUR), 2021
Andreas Madsen
Siva Reddy
A. Chandar
XAI
370
281
0
10 Aug 2021
Neural Abstructions: Abstractions that Support Construction for Grounded Language Learning
Kaylee Burns
Christopher D. Manning
Li Fei-Fei
178
0
0
20 Jul 2021
PARP: Prune, Adjust and Re-Prune for Self-Supervised Speech Recognition
Neural Information Processing Systems (NeurIPS), 2021
Cheng-I Jeff Lai
Yang Zhang
Alexander H. Liu
Shiyu Chang
Yi-Lun Liao
Yung-Sung Chuang
Kaizhi Qian
Sameer Khurana
David D. Cox
James R. Glass
VLM
305
86
0
10 Jun 2021
Improving Compositionality of Neural Networks by Decoding Representations to Inputs
Neural Information Processing Systems (NeurIPS), 2021
Mike Wu
Noah D. Goodman
Stefano Ermon
AI4CE
127
3
0
01 Jun 2021
On the Interplay Between Fine-tuning and Composition in Transformers
Findings (Findings), 2021
Lang-Chi Yu
Allyson Ettinger
235
14
0
31 May 2021
The Definitions of Interpretability and Learning of Interpretable Models
Weishen Pan
Changshui Zhang
FaML
XAI
108
4
0
29 May 2021
Fine-grained Interpretation and Causation Analysis in Deep NLP Models
North American Chapter of the Association for Computational Linguistics (NAACL), 2021
Hassan Sajjad
Narine Kokhlikyan
Fahim Dalvi
Nadir Durrani
MILM
326
8
0
17 May 2021
Connecting Attributions and QA Model Behavior on Realistic Counterfactuals
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2021
Xi Ye
Rohan Nair
Greg Durrett
248
28
0
09 Apr 2021
The Mind's Eye: Visualizing Class-Agnostic Features of CNNs
International Conference on Information Photonics (ICIP), 2021
Alexandros Stergiou
FAtt
131
4
0
29 Jan 2021
FastIF: Scalable Influence Functions for Efficient Model Interpretation and Debugging
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2020
Han Guo
Nazneen Rajani
Peter Hase
Joey Tianyi Zhou
Caiming Xiong
TDI
415
133
0
31 Dec 2020
Transformer Feed-Forward Layers Are Key-Value Memories
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2020
Mor Geva
R. Schuster
Jonathan Berant
Omer Levy
KELM
650
1,177
0
29 Dec 2020
Revisiting Edge Detection in Convolutional Neural Networks
IEEE International Joint Conference on Neural Network (IJCNN), 2020
Minh Le
Subhradeep Kayal
FAtt
236
16
0
25 Dec 2020
Achilles Heels for AGI/ASI via Decision Theoretic Adversaries
Stephen L. Casper
418
4
0
12 Oct 2020
LIMEADE: From AI Explanations to Advice Taking
Benjamin Charles Germain Lee
Doug Downey
Kyle Lo
Daniel S. Weld
335
9
0
09 Mar 2020
Frivolous Units: Wider Networks Are Not Really That Wide
AAAI Conference on Artificial Intelligence (AAAI), 2019
Stephen Casper
Xavier Boix
Vanessa D’Amario
Ling Guo
Martin Schrimpf
Kasper Vinken
Gabriel Kreiman
263
20
0
10 Dec 2019
Discovering the Compositional Structure of Vector Representations with Role Learning Networks
BlackboxNLP Workshop on Analyzing and Interpreting Neural Networks for NLP (BlackBoxNLP), 2019
Paul Soulos
R. Thomas McCoy
Tal Linzen
P. Smolensky
CoGe
415
46
0
21 Oct 2019
Considerations When Learning Additive Explanations for Black-Box Models
S. Tan
Giles Hooker
Paul Koch
Albert Gordo
R. Caruana
FAtt
403
28
0
26 Jan 2018
Previous
1
2
3
Page 3 of 3