v1v2 (latest)

Compositional Explanations of Neurons

Neural Information Processing Systems (NeurIPS), 2020

24 June 2020

Papers citing "Compositional Explanations of Neurons"

46 / 146 papers shown

Diagnostics for Deep Neural Networks with Automated Copy/Paste Attacks

Stephen Casper

K. Hariharan

Dylan Hadfield-Menell

AAML

416

18 Nov 2022

Finding Skill Neurons in Pre-trained Transformer-based Language ModelsConference on Empirical Methods in Natural Language Processing (EMNLP), 2022

Xiaozhi Wang

Kaiyue Wen

Zhengyan Zhang

Lei Hou

Zhiyuan Liu

Juanzi Li

MILM MoE

197

14 Nov 2022

New Definitions and Evaluations for Saliency Methods: Staying Intrinsic, Complete and SoundNeural Information Processing Systems (NeurIPS), 2022

Dingli Yu

139

05 Nov 2022

Interpretability in the Wild: a Circuit for Indirect Object Identification in GPT-2 smallInternational Conference on Learning Representations (ICLR), 2022

628

803

01 Nov 2022

Post-hoc analysis of Arabic transformer modelsBlackboxNLP Workshop on Analyzing and Interpreting Neural Networks for NLP (BlackboxNLP), 2022

134

18 Oct 2022

Global Concept-Based Interpretability for Graph Neural Networks via Neuron AnalysisAAAI Conference on Artificial Intelligence (AAAI), 2022

Xuanyuan Han

Pietro Barbiero

Dobrik Georgiev

Lucie Charlotte Magister

Pietro Lio

MILM

259

22 Aug 2022

Toward Transparent AI: A Survey on Interpreting the Inner Structures of Deep Neural Networks

Tilman Raukur

A. Ho

Stephen Casper

Dylan Hadfield-Menell

AAML AI4CE

787

170

27 Jul 2022

Interpretable by Design: Learning Predictors by Composing Interpretable QueriesIEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2022

257

03 Jul 2022

Analyzing Encoded Concepts in Transformer Language ModelsNorth American Chapter of the Association for Computational Linguistics (NAACL), 2022

Firoj Alam

187

27 Jun 2022

Discovering Salient Neurons in Deep NLP ModelsJournal of machine learning research (JMLR), 2022

307

27 Jun 2022

Coupling Visual Semantics of Artificial Neural Networks and Human Brain Function via Synchronized ActivationsIEEE Transactions on Cognitive and Developmental Systems (IEEE TCDS), 2022

Lu Zhang

...

Xiaoyan Cai

Xi Jiang

Sheng Li

Dajiang Zhu

Tianming Liu

150

22 Jun 2022

DORA: Exploring Outlier Representations in Deep Neural Networks

446

09 Jun 2022

Pruning for Feature-Preserving Circuits in CNNs

Christopher Hamblin

Talia Konkle

G. Alvarez

326

03 Jun 2022

CLIP-Dissect: Automatic Description of Neuron Representations in Deep Vision NetworksInternational Conference on Learning Representations (ICLR), 2022

Tuomas P. Oikarinen

Tsui-Wei Weng

VLM

392

127

23 Apr 2022

Learning to Scaffold: Optimizing Model Explanations for TeachingNeural Information Processing Systems (NeurIPS), 2022

Patrick Fernandes

Marcos Vinícius Treviso

Danish Pruthi

André F. T. Martins

Graham Neubig

FAtt

288

22 Apr 2022

HINT: Hierarchical Neuron Concept ExplainerComputer Vision and Pattern Recognition (CVPR), 2022

Andong Wang

Wei-Ning Lee

Xiaojuan Qi

195

27 Mar 2022

Towards Explainable Evaluation Metrics for Natural Language Generation

Christoph Leiter

Piyawat Lertvittayakumjorn

245

21 Mar 2022

Natural Language Descriptions of Deep Visual FeaturesInternational Conference on Learning Representations (ICLR), 2022

Antonio Torralba

994

150

26 Jan 2022

From Anecdotal Evidence to Quantitative Evaluation Methods: A Systematic Review on Evaluating Explainable AIACM Computing Surveys (ACM CSUR), 2022

619

577

20 Jan 2022

A Latent-Variable Model for Intrinsic ProbingAAAI Conference on Artificial Intelligence (AAAI), 2022

Karolina Stañczak

Lucas Torroba Hennigen

Adina Williams

Robert Bamler

Isabelle Augenstein

410

20 Jan 2022

Interpreting Arabic Transformer Models

152

19 Jan 2022

Forward Composition Propagation for Explainable Neural ReasoningIEEE Computational Intelligence Magazine (IEEE CIM), 2021

172

23 Dec 2021

Can Explanations Be Useful for Calibrating Black Box Models?

Xi Ye

Greg Durrett

FAtt

247

14 Oct 2021

Quantifying Local Specialization in Deep Neural Networks

243

13 Oct 2021

Robust Feature-Level Adversaries are Interpretability Tools

Stephen Casper

Max Nadeau

Dylan Hadfield-Menell

Gabriel Kreiman

AAML

702

07 Oct 2021

Detection Accuracy for Evaluating Compositional Explanations of Units

263

16 Sep 2021

A Bayesian Framework for Information-Theoretic ProbingConference on Empirical Methods in Natural Language Processing (EMNLP), 2021

Tiago Pimentel

Robert Bamler

230

08 Sep 2021

Neuron-level Interpretation of Deep NLP Models: A SurveyTransactions of the Association for Computational Linguistics (TACL), 2021

321

30 Aug 2021

Explaining Bayesian Neural Networks

428

23 Aug 2021

Post-hoc Interpretability for Neural NLP: A SurveyACM Computing Surveys (CSUR), 2021

Andreas Madsen

Siva Reddy

A. Chandar

XAI

370

281

10 Aug 2021

Neural Abstructions: Abstractions that Support Construction for Grounded Language Learning

Kaylee Burns

Christopher D. Manning

Li Fei-Fei

178

20 Jul 2021

PARP: Prune, Adjust and Re-Prune for Self-Supervised Speech RecognitionNeural Information Processing Systems (NeurIPS), 2021

Kaizhi Qian

305

10 Jun 2021

Improving Compositionality of Neural Networks by Decoding Representations to InputsNeural Information Processing Systems (NeurIPS), 2021

127

01 Jun 2021

On the Interplay Between Fine-tuning and Composition in TransformersFindings (Findings), 2021

Lang-Chi Yu

Allyson Ettinger

235

31 May 2021

The Definitions of Interpretability and Learning of Interpretable Models

Weishen Pan

Changshui Zhang

FaML XAI

108

29 May 2021

Fine-grained Interpretation and Causation Analysis in Deep NLP ModelsNorth American Chapter of the Association for Computational Linguistics (NAACL), 2021

326

17 May 2021

Connecting Attributions and QA Model Behavior on Realistic CounterfactualsConference on Empirical Methods in Natural Language Processing (EMNLP), 2021

Xi Ye

Rohan Nair

Greg Durrett

248

09 Apr 2021

The Mind's Eye: Visualizing Class-Agnostic Features of CNNsInternational Conference on Information Photonics (ICIP), 2021

Alexandros Stergiou

FAtt

131

29 Jan 2021

FastIF: Scalable Influence Functions for Efficient Model Interpretation and DebuggingConference on Empirical Methods in Natural Language Processing (EMNLP), 2020

415

133

31 Dec 2020

Transformer Feed-Forward Layers Are Key-Value MemoriesConference on Empirical Methods in Natural Language Processing (EMNLP), 2020

650

1,177

29 Dec 2020

Revisiting Edge Detection in Convolutional Neural NetworksIEEE International Joint Conference on Neural Network (IJCNN), 2020

Minh Le

Subhradeep Kayal

FAtt

236

25 Dec 2020

Achilles Heels for AGI/ASI via Decision Theoretic Adversaries

Stephen L. Casper

418

12 Oct 2020

LIMEADE: From AI Explanations to Advice Taking

Benjamin Charles Germain Lee

Doug Downey

Kyle Lo

Daniel S. Weld

335

09 Mar 2020

Frivolous Units: Wider Networks Are Not Really That WideAAAI Conference on Artificial Intelligence (AAAI), 2019

263

10 Dec 2019

Discovering the Compositional Structure of Vector Representations with Role Learning NetworksBlackboxNLP Workshop on Analyzing and Interpreting Neural Networks for NLP (BlackBoxNLP), 2019

415

21 Oct 2019

Considerations When Learning Additive Explanations for Black-Box Models

403

26 Jan 2018