ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2006.14032
  4. Cited By
Compositional Explanations of Neurons
v1v2 (latest)

Compositional Explanations of Neurons

Neural Information Processing Systems (NeurIPS), 2020
24 June 2020
Jesse Mu
Jacob Andreas
    FAttCoGeMILM
ArXiv (abs)PDFHTML

Papers citing "Compositional Explanations of Neurons"

50 / 146 papers shown
Understanding polysemanticity in neural networks through coding theory
Understanding polysemanticity in neural networks through coding theory
Simon C. Marshall
Jan H. Kirchner
FAttMILMAAML
186
15
0
31 Jan 2024
Rethinking Interpretability in the Era of Large Language Models
Rethinking Interpretability in the Era of Large Language Models
Chandan Singh
J. Inala
Michel Galley
Rich Caruana
Jianfeng Gao
LRMAI4CE
300
115
0
30 Jan 2024
Towards Generating Informative Textual Description for Neurons in
  Language Models
Towards Generating Informative Textual Description for Neurons in Language Models
Shrayani Mondal
Rishabh Garodia
Arbaaz Qureshi
Taesung Lee
Youngja Park
MILM
180
1
0
30 Jan 2024
Knowledge-Aware Neuron Interpretation for Scene Classification
Knowledge-Aware Neuron Interpretation for Scene ClassificationAAAI Conference on Artificial Intelligence (AAAI), 2024
Yong Guan
Freddy Lecue
Jiaoyan Chen
Ru Li
Jeff Z. Pan
193
2
0
29 Jan 2024
Black-Box Access is Insufficient for Rigorous AI Audits
Black-Box Access is Insufficient for Rigorous AI AuditsConference on Fairness, Accountability and Transparency (FAccT), 2024
Stephen Casper
Carson Ezell
Charlotte Siegmann
Noam Kolt
Taylor Lynn Curtis
...
Michael Gerovitch
David Bau
Max Tegmark
David M. Krueger
Dylan Hadfield-Menell
AAML
562
135
0
25 Jan 2024
Universal Neurons in GPT2 Language Models
Universal Neurons in GPT2 Language Models
Wes Gurnee
Theo Horsley
Zifan Carl Guo
Tara Rezaei Kheirkhah
Qinyi Sun
Will Hathaway
Neel Nanda
Dimitris Bertsimas
MILM
348
81
0
22 Jan 2024
Explaining the Implicit Neural Canvas: Connecting Pixels to Neurons by
  Tracing their Contributions
Explaining the Implicit Neural Canvas: Connecting Pixels to Neurons by Tracing their Contributions
Namitha Padmanabhan
M. Gwilliam
Pulkit Kumar
Shishira R. Maiya
Max Ehrlich
Abhinav Shrivastava
323
4
1
18 Jan 2024
Manipulating Feature Visualizations with Gradient Slingshots
Manipulating Feature Visualizations with Gradient Slingshots
Dilyara Bareeva
Marina M.-C. Höhne
Alexander Warnecke
Lukas Pirch
Klaus-Robert Müller
Konrad Rieck
Sebastian Lapuschkin
Kirill Bykov
AAML
406
6
0
11 Jan 2024
MAMI: Multi-Attentional Mutual-Information for Long Sequence Neuron
  Captioning
MAMI: Multi-Attentional Mutual-Information for Long Sequence Neuron Captioning
Alfirsa Damasyifa Fauzulhaq
Wahyu Parwitayasa
Joseph A. Sugihdharma
M. F. Ridhani
N. Yudistira
195
0
0
05 Jan 2024
Large Language Models Relearn Removed Concepts
Large Language Models Relearn Removed ConceptsAnnual Meeting of the Association for Computational Linguistics (ACL), 2024
Michelle Lo
Shay B. Cohen
Fazl Barez
KELM
223
28
0
03 Jan 2024
Concept-based Explainable Artificial Intelligence: A Survey
Concept-based Explainable Artificial Intelligence: A Survey
Eleonora Poeta
Gabriele Ciravegna
Eliana Pastor
Tania Cerquitelli
Elena Baralis
LRMXAI
271
92
0
20 Dec 2023
A Glitch in the Matrix? Locating and Detecting Language Model Grounding with Fakepedia
A Glitch in the Matrix? Locating and Detecting Language Model Grounding with FakepediaAnnual Meeting of the Association for Computational Linguistics (ACL), 2023
Giovanni Monea
Maxime Peyrard
Martin Josifoski
Vishrav Chaudhary
Jason Eisner
Emre Kiciman
Hamid Palangi
Barun Patra
Robert West
KELM
505
25
0
04 Dec 2023
Adversarial Doodles: Interpretable and Human-drawable Attacks Provide
  Describable Insights
Adversarial Doodles: Interpretable and Human-drawable Attacks Provide Describable Insights
Ryoya Nara
Yusuke Matsui
AAML
282
0
0
27 Nov 2023
Labeling Neural Representations with Inverse Recognition
Labeling Neural Representations with Inverse RecognitionNeural Information Processing Systems (NeurIPS), 2023
Kirill Bykov
Laura Kopf
Shinichi Nakajima
Matthias Kirchler
Marina M.-C. Höhne
BDL
453
27
0
22 Nov 2023
Investigating the Encoding of Words in BERT's Neurons using Feature
  Textualization
Investigating the Encoding of Words in BERT's Neurons using Feature TextualizationBlackboxNLP Workshop on Analyzing and Interpreting Neural Networks for NLP (BlackboxNLP), 2023
Tanja Baeumel
Soniya Vijayakumar
Josef van Genabith
Guenter Neumann
Simon Ostermann
MILM
255
3
0
14 Nov 2023
Interpreting Pretrained Language Models via Concept Bottlenecks
Interpreting Pretrained Language Models via Concept Bottlenecks
Zhen Tan
Lu Cheng
Song Wang
Yuan Bo
Wenlin Yao
Huan Liu
LRM
236
35
0
08 Nov 2023
Towards Interpretable Sequence Continuation: Analyzing Shared Circuits
  in Large Language Models
Towards Interpretable Sequence Continuation: Analyzing Shared Circuits in Large Language Models
Michael Lan
Phillip H. S. Torr
Fazl Barez
LRM
379
8
0
07 Nov 2023
Towards a fuller understanding of neurons with Clustered Compositional
  Explanations
Towards a fuller understanding of neurons with Clustered Compositional ExplanationsNeural Information Processing Systems (NeurIPS), 2023
Biagio La Rosa
Leilani H. Gilpin
Roberto Capobianco
225
14
0
27 Oct 2023
Codebook Features: Sparse and Discrete Interpretability for Neural
  Networks
Codebook Features: Sparse and Discrete Interpretability for Neural NetworksInternational Conference on Machine Learning (ICML), 2023
Alex Tamkin
Mohammad Taufeeque
Noah D. Goodman
220
41
0
26 Oct 2023
How do Language Models Bind Entities in Context?
How do Language Models Bind Entities in Context?International Conference on Learning Representations (ICLR), 2023
Jiahai Feng
Jacob Steinhardt
325
65
0
26 Oct 2023
Corrupting Neuron Explanations of Deep Visual Features
Corrupting Neuron Explanations of Deep Visual FeaturesIEEE International Conference on Computer Vision (ICCV), 2023
Divyansh Srivastava
Tuomas P. Oikarinen
Tsui-Wei Weng
FAttAAML
128
3
0
25 Oct 2023
From Neural Activations to Concepts: A Survey on Explaining Concepts in
  Neural Networks
From Neural Activations to Concepts: A Survey on Explaining Concepts in Neural Networks
Jae Hee Lee
Sergio Lanza
Stefan Wermter
238
18
0
18 Oct 2023
Copy Suppression: Comprehensively Understanding an Attention Head
Copy Suppression: Comprehensively Understanding an Attention Head
Callum McDougall
Arthur Conmy
Cody Rushing
Thomas McGrath
Neel Nanda
MILM
268
54
0
06 Oct 2023
From Language Modeling to Instruction Following: Understanding the
  Behavior Shift in LLMs after Instruction Tuning
From Language Modeling to Instruction Following: Understanding the Behavior Shift in LLMs after Instruction TuningNorth American Chapter of the Association for Computational Linguistics (NAACL), 2023
Xuansheng Wu
Wenlin Yao
Jianshu Chen
Xiaoman Pan
Xiaoyang Wang
Ninghao Liu
Dong Yu
LRM
275
51
0
30 Sep 2023
Towards Best Practices of Activation Patching in Language Models:
  Metrics and Methods
Towards Best Practices of Activation Patching in Language Models: Metrics and MethodsInternational Conference on Learning Representations (ICLR), 2023
Fred Zhang
Neel Nanda
LLMSV
531
175
0
27 Sep 2023
Rigorously Assessing Natural Language Explanations of Neurons
Rigorously Assessing Natural Language Explanations of NeuronsBlackboxNLP Workshop on Analyzing and Interpreting Neural Networks for NLP (BlackboxNLP), 2023
Jing-ling Huang
Atticus Geiger
Karel DÓosterlinck
Zhengxuan Wu
Christopher Potts
MILM
241
40
0
19 Sep 2023
FIND: A Function Description Benchmark for Evaluating Interpretability
  Methods
FIND: A Function Description Benchmark for Evaluating Interpretability MethodsNeural Information Processing Systems (NeurIPS), 2023
Sarah Schwettmann
Tamar Rott Shaham
Joanna Materzyñska
Neil Chowdhury
Shuang Li
Jacob Andreas
David Bau
Antonio Torralba
265
31
0
07 Sep 2023
Explainability for Large Language Models: A Survey
Explainability for Large Language Models: A SurveyACM Transactions on Intelligent Systems and Technology (ACM TIST), 2023
Haiyan Zhao
Hanjie Chen
Fan Yang
Ninghao Liu
Huiqi Deng
Hengyi Cai
Shuaiqiang Wang
D. Yin
Jundong Li
LRM
500
710
0
02 Sep 2023
Emergent Linear Representations in World Models of Self-Supervised
  Sequence Models
Emergent Linear Representations in World Models of Self-Supervised Sequence ModelsBlackboxNLP Workshop on Analyzing and Interpreting Neural Networks for NLP (BlackboxNLP), 2023
Neel Nanda
Andrew Lee
Martin Wattenberg
FAttMILM
316
260
0
02 Sep 2023
Identifying Interpretable Subspaces in Image Representations
Identifying Interpretable Subspaces in Image RepresentationsInternational Conference on Machine Learning (ICML), 2023
Neha Kalibhat
S. Bhardwaj
Bayan Bruss
Hamed Firooz
Maziar Sanjabi
Soheil Feizi
FAtt
303
34
0
20 Jul 2023
Hierarchical Semantic Tree Concept Whitening for Interpretable Image
  Classification
Hierarchical Semantic Tree Concept Whitening for Interpretable Image Classification
Haixing Dai
Lu Zhang
Lin Zhao
Zihao Wu
Zheng Liu
...
Yanjun Lyu
Changying Li
Ninghao Liu
Tianming Liu
Dajiang Zhu
259
8
0
10 Jul 2023
Dear XAI Community, We Need to Talk! Fundamental Misconceptions in
  Current XAI Research
Dear XAI Community, We Need to Talk! Fundamental Misconceptions in Current XAI Research
Timo Freiesleben
Gunnar Konig
157
28
0
07 Jun 2023
A Survey on Explainability of Graph Neural Networks
A Survey on Explainability of Graph Neural NetworksIEEE Data Engineering Bulletin (IEEE Data Eng. Bull.), 2023
Jaykumar Kakkad
Jaspal Jannu
Kartik Sharma
Charu C. Aggarwal
Sourav Medya
252
55
0
02 Jun 2023
Neuron to Graph: Interpreting Language Model Neurons at Scale
Neuron to Graph: Interpreting Language Model Neurons at Scale
Alex Foote
Neel Nanda
Esben Kran
Ioannis Konstas
Shay B. Cohen
Fazl Barez
MILM
203
27
0
31 May 2023
NeuroX Library for Neuron Analysis of Deep NLP Models
NeuroX Library for Neuron Analysis of Deep NLP ModelsAnnual Meeting of the Association for Computational Linguistics (ACL), 2023
Fahim Dalvi
Hassan Sajjad
Nadir Durrani
239
14
0
26 May 2023
FICNN: A Framework for the Interpretation of Deep Convolutional Neural
  Networks
FICNN: A Framework for the Interpretation of Deep Convolutional Neural Networks
Hamed Behzadi-Khormouji
José Oramas
165
0
0
17 May 2023
Finding Neurons in a Haystack: Case Studies with Sparse Probing
Finding Neurons in a Haystack: Case Studies with Sparse Probing
Wes Gurnee
Neel Nanda
Matthew Pauly
Katherine Harvey
Dmitrii Troitskii
Dimitris Bertsimas
MILM
540
291
0
02 May 2023
Towards Automated Circuit Discovery for Mechanistic Interpretability
Towards Automated Circuit Discovery for Mechanistic InterpretabilityNeural Information Processing Systems (NeurIPS), 2023
Arthur Conmy
Augustine N. Mavor-Parker
Aengus Lynch
Stefan Heimersheim
Adrià Garriga-Alonso
542
460
0
28 Apr 2023
Concept-Monitor: Understanding DNN training through individual neurons
Concept-Monitor: Understanding DNN training through individual neurons
Mohammad Ali Khan
Tuomas P. Oikarinen
Tsui-Wei Weng
244
3
0
26 Apr 2023
N2G: A Scalable Approach for Quantifying Interpretable Neuron
  Representations in Large Language Models
N2G: A Scalable Approach for Quantifying Interpretable Neuron Representations in Large Language Models
Alex Foote
Neel Nanda
Esben Kran
Ionnis Konstas
Fazl Barez
MILM
157
4
0
22 Apr 2023
LINe: Out-of-Distribution Detection by Leveraging Important Neurons
LINe: Out-of-Distribution Detection by Leveraging Important NeuronsComputer Vision and Pattern Recognition (CVPR), 2023
Yong Hyun Ahn
Gyeong-Moon Park
Seong Tae Kim
OODD
312
44
0
24 Mar 2023
Unsupervised Interpretable Basis Extraction for Concept-Based Visual Explanations
Unsupervised Interpretable Basis Extraction for Concept-Based Visual ExplanationsIEEE Transactions on Artificial Intelligence (IEEE TAI), 2023
Alexandros Doumanoglou
S. Asteriadis
D. Zarpalas
FAttSSL
219
6
0
19 Mar 2023
Red Teaming Deep Neural Networks with Feature Synthesis Tools
Red Teaming Deep Neural Networks with Feature Synthesis ToolsNeural Information Processing Systems (NeurIPS), 2023
Stephen Casper
Yuxiao Li
Jiawei Li
Tong Bu
Ke Zhang
K. Hariharan
Dylan Hadfield-Menell
AAML
406
21
0
08 Feb 2023
A Survey of Explainable AI in Deep Visual Modeling: Methods and Metrics
A Survey of Explainable AI in Deep Visual Modeling: Methods and Metrics
Naveed Akhtar
XAIVLM
202
9
0
31 Jan 2023
Evaluating Neuron Interpretation Methods of NLP Models
Evaluating Neuron Interpretation Methods of NLP ModelsNeural Information Processing Systems (NeurIPS), 2023
Yimin Fan
Fahim Dalvi
Nadir Durrani
Hassan Sajjad
273
9
0
30 Jan 2023
Does Localization Inform Editing? Surprising Differences in
  Causality-Based Localization vs. Knowledge Editing in Language Models
Does Localization Inform Editing? Surprising Differences in Causality-Based Localization vs. Knowledge Editing in Language ModelsNeural Information Processing Systems (NeurIPS), 2023
Peter Hase
Joey Tianyi Zhou
Been Kim
Asma Ghandeharioun
MILM
348
234
0
10 Jan 2023
Can Large Language Models Change User Preference Adversarially?
Can Large Language Models Change User Preference Adversarially?
Varshini Subhash
AAML
191
9
0
05 Jan 2023
Teaching Matters: Investigating the Role of Supervision in Vision
  Transformers
Teaching Matters: Investigating the Role of Supervision in Vision TransformersComputer Vision and Pattern Recognition (CVPR), 2022
Matthew Walmer
Saksham Suri
Kamal Gupta
Abhinav Shrivastava
378
40
0
07 Dec 2022
What learning algorithm is in-context learning? Investigations with
  linear models
What learning algorithm is in-context learning? Investigations with linear modelsInternational Conference on Learning Representations (ICLR), 2022
Ekin Akyürek
Dale Schuurmans
Jacob Andreas
Tengyu Ma
Denny Zhou
548
620
0
28 Nov 2022
Language in a Bottle: Language Model Guided Concept Bottlenecks for
  Interpretable Image Classification
Language in a Bottle: Language Model Guided Concept Bottlenecks for Interpretable Image ClassificationComputer Vision and Pattern Recognition (CVPR), 2022
Yue Yang
Artemis Panagopoulou
Shenghao Zhou
Daniel Jin
Chris Callison-Burch
Mark Yatskar
406
311
0
21 Nov 2022
Previous
123
Next
Page 2 of 3