ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2006.14032
  4. Cited By
Compositional Explanations of Neurons
v1v2 (latest)

Compositional Explanations of Neurons

Neural Information Processing Systems (NeurIPS), 2020
24 June 2020
Jesse Mu
Jacob Andreas
    FAttCoGeMILM
ArXiv (abs)PDFHTML

Papers citing "Compositional Explanations of Neurons"

46 / 146 papers shown
Diagnostics for Deep Neural Networks with Automated Copy/Paste Attacks
Diagnostics for Deep Neural Networks with Automated Copy/Paste Attacks
Stephen Casper
K. Hariharan
Dylan Hadfield-Menell
AAML
416
11
0
18 Nov 2022
Finding Skill Neurons in Pre-trained Transformer-based Language Models
Finding Skill Neurons in Pre-trained Transformer-based Language ModelsConference on Empirical Methods in Natural Language Processing (EMNLP), 2022
Xiaozhi Wang
Kaiyue Wen
Zhengyan Zhang
Lei Hou
Zhiyuan Liu
Juanzi Li
MILMMoE
197
60
0
14 Nov 2022
New Definitions and Evaluations for Saliency Methods: Staying Intrinsic,
  Complete and Sound
New Definitions and Evaluations for Saliency Methods: Staying Intrinsic, Complete and SoundNeural Information Processing Systems (NeurIPS), 2022
Arushi Gupta
Nikunj Saunshi
Dingli Yu
Kaifeng Lyu
Sanjeev Arora
AAMLFAttXAI
139
8
0
05 Nov 2022
Interpretability in the Wild: a Circuit for Indirect Object
  Identification in GPT-2 small
Interpretability in the Wild: a Circuit for Indirect Object Identification in GPT-2 smallInternational Conference on Learning Representations (ICLR), 2022
Kevin Wang
Alexandre Variengien
Arthur Conmy
Buck Shlegeris
Jacob Steinhardt
628
803
0
01 Nov 2022
Post-hoc analysis of Arabic transformer models
Post-hoc analysis of Arabic transformer modelsBlackboxNLP Workshop on Analyzing and Interpreting Neural Networks for NLP (BlackboxNLP), 2022
Ahmed Abdelali
Nadir Durrani
Fahim Dalvi
Hassan Sajjad
134
1
0
18 Oct 2022
Global Concept-Based Interpretability for Graph Neural Networks via
  Neuron Analysis
Global Concept-Based Interpretability for Graph Neural Networks via Neuron AnalysisAAAI Conference on Artificial Intelligence (AAAI), 2022
Xuanyuan Han
Pietro Barbiero
Dobrik Georgiev
Lucie Charlotte Magister
Pietro Lio
MILM
259
57
0
22 Aug 2022
Toward Transparent AI: A Survey on Interpreting the Inner Structures of
  Deep Neural Networks
Toward Transparent AI: A Survey on Interpreting the Inner Structures of Deep Neural Networks
Tilman Raukur
A. Ho
Stephen Casper
Dylan Hadfield-Menell
AAMLAI4CE
787
170
0
27 Jul 2022
Interpretable by Design: Learning Predictors by Composing Interpretable
  Queries
Interpretable by Design: Learning Predictors by Composing Interpretable QueriesIEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2022
Aditya Chattopadhyay
Stewart Slocum
B. Haeffele
René Vidal
D. Geman
257
30
0
03 Jul 2022
Analyzing Encoded Concepts in Transformer Language Models
Analyzing Encoded Concepts in Transformer Language ModelsNorth American Chapter of the Association for Computational Linguistics (NAACL), 2022
Hassan Sajjad
Nadir Durrani
Fahim Dalvi
Firoj Alam
A. Khan
Jia Xu
187
54
0
27 Jun 2022
Discovering Salient Neurons in Deep NLP Models
Discovering Salient Neurons in Deep NLP ModelsJournal of machine learning research (JMLR), 2022
Nadir Durrani
Fahim Dalvi
Hassan Sajjad
KELMMILM
307
20
0
27 Jun 2022
Coupling Visual Semantics of Artificial Neural Networks and Human Brain
  Function via Synchronized Activations
Coupling Visual Semantics of Artificial Neural Networks and Human Brain Function via Synchronized ActivationsIEEE Transactions on Cognitive and Developmental Systems (IEEE TCDS), 2022
Lin Zhao
Haixing Dai
Zihao Wu
Zhe Xiao
Lu Zhang
...
Xiaoyan Cai
Xi Jiang
Sheng Li
Dajiang Zhu
Tianming Liu
150
9
0
22 Jun 2022
DORA: Exploring Outlier Representations in Deep Neural Networks
DORA: Exploring Outlier Representations in Deep Neural Networks
Kirill Bykov
Mayukh Deb
Dennis Grinwald
Klaus-Robert Muller
Marina M.-C. Höhne
446
16
0
09 Jun 2022
Pruning for Feature-Preserving Circuits in CNNs
Pruning for Feature-Preserving Circuits in CNNs
Christopher Hamblin
Talia Konkle
G. Alvarez
326
2
0
03 Jun 2022
CLIP-Dissect: Automatic Description of Neuron Representations in Deep
  Vision Networks
CLIP-Dissect: Automatic Description of Neuron Representations in Deep Vision NetworksInternational Conference on Learning Representations (ICLR), 2022
Tuomas P. Oikarinen
Tsui-Wei Weng
VLM
392
127
1
23 Apr 2022
Learning to Scaffold: Optimizing Model Explanations for Teaching
Learning to Scaffold: Optimizing Model Explanations for TeachingNeural Information Processing Systems (NeurIPS), 2022
Patrick Fernandes
Marcos Vinícius Treviso
Danish Pruthi
André F. T. Martins
Graham Neubig
FAtt
288
24
0
22 Apr 2022
HINT: Hierarchical Neuron Concept Explainer
HINT: Hierarchical Neuron Concept ExplainerComputer Vision and Pattern Recognition (CVPR), 2022
Andong Wang
Wei-Ning Lee
Xiaojuan Qi
195
22
0
27 Mar 2022
Towards Explainable Evaluation Metrics for Natural Language Generation
Towards Explainable Evaluation Metrics for Natural Language Generation
Christoph Leiter
Piyawat Lertvittayakumjorn
M. Fomicheva
Wei Zhao
Yang Gao
Steffen Eger
AAMLELM
245
21
0
21 Mar 2022
Natural Language Descriptions of Deep Visual Features
Natural Language Descriptions of Deep Visual FeaturesInternational Conference on Learning Representations (ICLR), 2022
Evan Hernandez
Sarah Schwettmann
David Bau
Teona Bagashvili
Antonio Torralba
Jacob Andreas
MILM
994
150
0
26 Jan 2022
From Anecdotal Evidence to Quantitative Evaluation Methods: A Systematic
  Review on Evaluating Explainable AI
From Anecdotal Evidence to Quantitative Evaluation Methods: A Systematic Review on Evaluating Explainable AIACM Computing Surveys (ACM CSUR), 2022
Meike Nauta
Jan Trienes
Shreyasi Pathak
Elisa Nguyen
Michelle Peters
Yasmin Schmitt
Jorg Schlotterer
M. V. Keulen
C. Seifert
ELMXAI
619
577
0
20 Jan 2022
A Latent-Variable Model for Intrinsic Probing
A Latent-Variable Model for Intrinsic ProbingAAAI Conference on Artificial Intelligence (AAAI), 2022
Karolina Stañczak
Lucas Torroba Hennigen
Adina Williams
Robert Bamler
Isabelle Augenstein
410
6
0
20 Jan 2022
Interpreting Arabic Transformer Models
Ahmed Abdelali
Nadir Durrani
Fahim Dalvi
Hassan Sajjad
152
2
0
19 Jan 2022
Forward Composition Propagation for Explainable Neural Reasoning
Forward Composition Propagation for Explainable Neural ReasoningIEEE Computational Intelligence Magazine (IEEE CIM), 2021
Isel Grau
Gonzalo Nápoles
M. Bello
Yamisleydi Salgueiro
A. Jastrzębska
172
2
0
23 Dec 2021
Can Explanations Be Useful for Calibrating Black Box Models?
Can Explanations Be Useful for Calibrating Black Box Models?
Xi Ye
Greg Durrett
FAtt
247
29
0
14 Oct 2021
Quantifying Local Specialization in Deep Neural Networks
Quantifying Local Specialization in Deep Neural Networks
Shlomi Hod
Daniel Filan
Stephen Casper
Andrew Critch
Stuart J. Russell
243
12
0
13 Oct 2021
Robust Feature-Level Adversaries are Interpretability Tools
Robust Feature-Level Adversaries are Interpretability Tools
Stephen Casper
Max Nadeau
Dylan Hadfield-Menell
Gabriel Kreiman
AAML
702
33
0
07 Oct 2021
Detection Accuracy for Evaluating Compositional Explanations of Units
Detection Accuracy for Evaluating Compositional Explanations of Units
Sayo M. Makinwa
Biagio La Rosa
Roberto Capobianco
FAttCoGe
263
3
0
16 Sep 2021
A Bayesian Framework for Information-Theoretic Probing
A Bayesian Framework for Information-Theoretic ProbingConference on Empirical Methods in Natural Language Processing (EMNLP), 2021
Tiago Pimentel
Robert Bamler
230
25
0
08 Sep 2021
Neuron-level Interpretation of Deep NLP Models: A Survey
Neuron-level Interpretation of Deep NLP Models: A SurveyTransactions of the Association for Computational Linguistics (TACL), 2021
Hassan Sajjad
Nadir Durrani
Fahim Dalvi
MILMAI4CE
321
97
0
30 Aug 2021
Explaining Bayesian Neural Networks
Explaining Bayesian Neural Networks
Kirill Bykov
Marina M.-C. Höhne
Adelaida Creosteanu
Klaus-Robert Muller
Frederick Klauschen
Shinichi Nakajima
Matthias Kirchler
BDLAAML
428
30
0
23 Aug 2021
Post-hoc Interpretability for Neural NLP: A Survey
Post-hoc Interpretability for Neural NLP: A SurveyACM Computing Surveys (CSUR), 2021
Andreas Madsen
Siva Reddy
A. Chandar
XAI
370
281
0
10 Aug 2021
Neural Abstructions: Abstractions that Support Construction for Grounded
  Language Learning
Neural Abstructions: Abstractions that Support Construction for Grounded Language Learning
Kaylee Burns
Christopher D. Manning
Li Fei-Fei
178
0
0
20 Jul 2021
PARP: Prune, Adjust and Re-Prune for Self-Supervised Speech Recognition
PARP: Prune, Adjust and Re-Prune for Self-Supervised Speech RecognitionNeural Information Processing Systems (NeurIPS), 2021
Cheng-I Jeff Lai
Yang Zhang
Alexander H. Liu
Shiyu Chang
Yi-Lun Liao
Yung-Sung Chuang
Kaizhi Qian
Sameer Khurana
David D. Cox
James R. Glass
VLM
305
86
0
10 Jun 2021
Improving Compositionality of Neural Networks by Decoding
  Representations to Inputs
Improving Compositionality of Neural Networks by Decoding Representations to InputsNeural Information Processing Systems (NeurIPS), 2021
Mike Wu
Noah D. Goodman
Stefano Ermon
AI4CE
127
3
0
01 Jun 2021
On the Interplay Between Fine-tuning and Composition in Transformers
On the Interplay Between Fine-tuning and Composition in TransformersFindings (Findings), 2021
Lang-Chi Yu
Allyson Ettinger
235
14
0
31 May 2021
The Definitions of Interpretability and Learning of Interpretable Models
The Definitions of Interpretability and Learning of Interpretable Models
Weishen Pan
Changshui Zhang
FaMLXAI
108
4
0
29 May 2021
Fine-grained Interpretation and Causation Analysis in Deep NLP Models
Fine-grained Interpretation and Causation Analysis in Deep NLP ModelsNorth American Chapter of the Association for Computational Linguistics (NAACL), 2021
Hassan Sajjad
Narine Kokhlikyan
Fahim Dalvi
Nadir Durrani
MILM
326
8
0
17 May 2021
Connecting Attributions and QA Model Behavior on Realistic
  Counterfactuals
Connecting Attributions and QA Model Behavior on Realistic CounterfactualsConference on Empirical Methods in Natural Language Processing (EMNLP), 2021
Xi Ye
Rohan Nair
Greg Durrett
248
28
0
09 Apr 2021
The Mind's Eye: Visualizing Class-Agnostic Features of CNNs
The Mind's Eye: Visualizing Class-Agnostic Features of CNNsInternational Conference on Information Photonics (ICIP), 2021
Alexandros Stergiou
FAtt
131
4
0
29 Jan 2021
FastIF: Scalable Influence Functions for Efficient Model Interpretation
  and Debugging
FastIF: Scalable Influence Functions for Efficient Model Interpretation and DebuggingConference on Empirical Methods in Natural Language Processing (EMNLP), 2020
Han Guo
Nazneen Rajani
Peter Hase
Joey Tianyi Zhou
Caiming Xiong
TDI
415
133
0
31 Dec 2020
Transformer Feed-Forward Layers Are Key-Value Memories
Transformer Feed-Forward Layers Are Key-Value MemoriesConference on Empirical Methods in Natural Language Processing (EMNLP), 2020
Mor Geva
R. Schuster
Jonathan Berant
Omer Levy
KELM
650
1,177
0
29 Dec 2020
Revisiting Edge Detection in Convolutional Neural Networks
Revisiting Edge Detection in Convolutional Neural NetworksIEEE International Joint Conference on Neural Network (IJCNN), 2020
Minh Le
Subhradeep Kayal
FAtt
236
16
0
25 Dec 2020
Achilles Heels for AGI/ASI via Decision Theoretic Adversaries
Achilles Heels for AGI/ASI via Decision Theoretic Adversaries
Stephen L. Casper
418
4
0
12 Oct 2020
LIMEADE: From AI Explanations to Advice Taking
LIMEADE: From AI Explanations to Advice Taking
Benjamin Charles Germain Lee
Doug Downey
Kyle Lo
Daniel S. Weld
335
9
0
09 Mar 2020
Frivolous Units: Wider Networks Are Not Really That Wide
Frivolous Units: Wider Networks Are Not Really That WideAAAI Conference on Artificial Intelligence (AAAI), 2019
Stephen Casper
Xavier Boix
Vanessa D’Amario
Ling Guo
Martin Schrimpf
Kasper Vinken
Gabriel Kreiman
263
20
0
10 Dec 2019
Discovering the Compositional Structure of Vector Representations with
  Role Learning Networks
Discovering the Compositional Structure of Vector Representations with Role Learning NetworksBlackboxNLP Workshop on Analyzing and Interpreting Neural Networks for NLP (BlackBoxNLP), 2019
Paul Soulos
R. Thomas McCoy
Tal Linzen
P. Smolensky
CoGe
415
46
0
21 Oct 2019
Considerations When Learning Additive Explanations for Black-Box Models
Considerations When Learning Additive Explanations for Black-Box Models
S. Tan
Giles Hooker
Paul Koch
Albert Gordo
R. Caruana
FAtt
403
28
0
26 Jan 2018
Previous
123
Page 3 of 3