ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2110.07483
  4. Cited By
On the Pitfalls of Analyzing Individual Neurons in Language Models

On the Pitfalls of Analyzing Individual Neurons in Language Models

14 October 2021
Omer Antverg
Yonatan Belinkov
    MILM
ArXivPDFHTML

Papers citing "On the Pitfalls of Analyzing Individual Neurons in Language Models"

41 / 41 papers shown
Title
Disentangling Linguistic Features with Dimension-Wise Analysis of Vector Embeddings
Disentangling Linguistic Features with Dimension-Wise Analysis of Vector Embeddings
Saniya Karwa
Navpreet Singh
CoGe
34
0
0
20 Apr 2025
On Relation-Specific Neurons in Large Language Models
On Relation-Specific Neurons in Large Language Models
Yihong Liu
Runsheng Chen
Lea Hirlimann
Ahmad Dawar Hakimi
Mingyang Wang
Amir Hossein Kargaran
S. Rothe
François Yvon
Hinrich Schütze
KELM
36
0
0
24 Feb 2025
A Theoretical Survey on Foundation Models
A Theoretical Survey on Foundation Models
Shi Fu
Yuzhu Chen
Yingjie Wang
Dacheng Tao
21
0
0
15 Oct 2024
Mechanistic?
Mechanistic?
Naomi Saphra
Sarah Wiegreffe
AI4CE
21
9
0
07 Oct 2024
On Behalf of the Stakeholders: Trends in NLP Model Interpretability in the Era of LLMs
On Behalf of the Stakeholders: Trends in NLP Model Interpretability in the Era of LLMs
Nitay Calderon
Roi Reichart
32
10
0
27 Jul 2024
Crafting Large Language Models for Enhanced Interpretability
Crafting Large Language Models for Enhanced Interpretability
Chung-En Sun
Tuomas P. Oikarinen
Tsui-Wei Weng
25
6
0
05 Jul 2024
A Practical Review of Mechanistic Interpretability for Transformer-Based Language Models
A Practical Review of Mechanistic Interpretability for Transformer-Based Language Models
Daking Rai
Yilun Zhou
Shi Feng
Abulhair Saparov
Ziyu Yao
75
19
0
02 Jul 2024
Confidence Regulation Neurons in Language Models
Confidence Regulation Neurons in Language Models
Alessandro Stolfo
Ben Wu
Wes Gurnee
Yonatan Belinkov
Xingyi Song
Mrinmaya Sachan
Neel Nanda
29
12
0
24 Jun 2024
ConceptPrune: Concept Editing in Diffusion Models via Skilled Neuron
  Pruning
ConceptPrune: Concept Editing in Diffusion Models via Skilled Neuron Pruning
Ruchika Chavhan
Da Li
Timothy M. Hospedales
36
15
0
29 May 2024
BadActs: A Universal Backdoor Defense in the Activation Space
BadActs: A Universal Backdoor Defense in the Activation Space
Biao Yi
Sishuo Chen
Yiming Li
Tong Li
Baolei Zhang
Zheli Liu
AAML
33
5
0
18 May 2024
Towards detecting unanticipated bias in Large Language Models
Towards detecting unanticipated bias in Large Language Models
Anna Kruspe
28
3
0
03 Apr 2024
Tracing the Roots of Facts in Multilingual Language Models: Independent,
  Shared, and Transferred Knowledge
Tracing the Roots of Facts in Multilingual Language Models: Independent, Shared, and Transferred Knowledge
Xin Zhao
Naoki Yoshinaga
Daisuke Oba
KELM
HILM
22
10
0
08 Mar 2024
Assessing the Brittleness of Safety Alignment via Pruning and Low-Rank
  Modifications
Assessing the Brittleness of Safety Alignment via Pruning and Low-Rank Modifications
Boyi Wei
Kaixuan Huang
Yangsibo Huang
Tinghao Xie
Xiangyu Qi
Mengzhou Xia
Prateek Mittal
Mengdi Wang
Peter Henderson
AAML
55
79
0
07 Feb 2024
Black-Box Access is Insufficient for Rigorous AI Audits
Black-Box Access is Insufficient for Rigorous AI Audits
Stephen Casper
Carson Ezell
Charlotte Siegmann
Noam Kolt
Taylor Lynn Curtis
...
Michael Gerovitch
David Bau
Max Tegmark
David M. Krueger
Dylan Hadfield-Menell
AAML
13
76
0
25 Jan 2024
Universal Neurons in GPT2 Language Models
Universal Neurons in GPT2 Language Models
Wes Gurnee
Theo Horsley
Zifan Carl Guo
Tara Rezaei Kheirkhah
Qinyi Sun
Will Hathaway
Neel Nanda
Dimitris Bertsimas
MILM
92
37
0
22 Jan 2024
Large Language Models Relearn Removed Concepts
Large Language Models Relearn Removed Concepts
Michelle Lo
Shay B. Cohen
Fazl Barez
KELM
13
14
0
03 Jan 2024
Towards a fuller understanding of neurons with Clustered Compositional
  Explanations
Towards a fuller understanding of neurons with Clustered Compositional Explanations
Biagio La Rosa
Leilani H. Gilpin
Roberto Capobianco
22
2
0
27 Oct 2023
DeepDecipher: Accessing and Investigating Neuron Activation in Large
  Language Models
DeepDecipher: Accessing and Investigating Neuron Activation in Large Language Models
Albert Garde
Esben Kran
Fazl Barez
11
2
0
03 Oct 2023
Rigorously Assessing Natural Language Explanations of Neurons
Rigorously Assessing Natural Language Explanations of Neurons
Jing-ling Huang
Atticus Geiger
Karel DÓosterlinck
Zhengxuan Wu
Christopher Potts
MILM
16
25
0
19 Sep 2023
Explainability for Large Language Models: A Survey
Explainability for Large Language Models: A Survey
Haiyan Zhao
Hanjie Chen
Fan Yang
Ninghao Liu
Huiqi Deng
Hengyi Cai
Shuaiqiang Wang
Dawei Yin
Mengnan Du
LRM
19
407
0
02 Sep 2023
Emergent Linear Representations in World Models of Self-Supervised
  Sequence Models
Emergent Linear Representations in World Models of Self-Supervised Sequence Models
Neel Nanda
Andrew Lee
Martin Wattenberg
FAtt
MILM
37
141
0
02 Sep 2023
NeuroX Library for Neuron Analysis of Deep NLP Models
NeuroX Library for Neuron Analysis of Deep NLP Models
Fahim Dalvi
Hassan Sajjad
Nadir Durrani
25
9
0
26 May 2023
A Mechanistic Interpretation of Arithmetic Reasoning in Language Models
  using Causal Mediation Analysis
A Mechanistic Interpretation of Arithmetic Reasoning in Language Models using Causal Mediation Analysis
Alessandro Stolfo
Yonatan Belinkov
Mrinmaya Sachan
MILM
KELM
LRM
33
47
0
24 May 2023
Can LLMs facilitate interpretation of pre-trained language models?
Can LLMs facilitate interpretation of pre-trained language models?
Basel Mousi
Nadir Durrani
Fahim Dalvi
36
12
0
22 May 2023
Finding Neurons in a Haystack: Case Studies with Sparse Probing
Finding Neurons in a Haystack: Case Studies with Sparse Probing
Wes Gurnee
Neel Nanda
Matthew Pauly
Katherine Harvey
Dmitrii Troitskii
Dimitris Bertsimas
MILM
153
186
0
02 May 2023
Redundancy and Concept Analysis for Code-trained Language Models
Redundancy and Concept Analysis for Code-trained Language Models
Arushi Sharma
Zefu Hu
Christopher Quinn
Ali Jannesari
64
1
0
01 May 2023
Evaluating Neuron Interpretation Methods of NLP Models
Evaluating Neuron Interpretation Methods of NLP Models
Yimin Fan
Fahim Dalvi
Nadir Durrani
Hassan Sajjad
35
8
0
30 Jan 2023
Finding Skill Neurons in Pre-trained Transformer-based Language Models
Finding Skill Neurons in Pre-trained Transformer-based Language Models
Xiaozhi Wang
Kaiyue Wen
Zhengyan Zhang
Lei Hou
Zhiyuan Liu
Juanzi Li
MILM
MoE
13
50
0
14 Nov 2022
Causal Analysis of Syntactic Agreement Neurons in Multilingual Language
  Models
Causal Analysis of Syntactic Agreement Neurons in Multilingual Language Models
Aaron Mueller
Yudi Xia
Tal Linzen
MILM
28
9
0
25 Oct 2022
Measures of Information Reflect Memorization Patterns
Measures of Information Reflect Memorization Patterns
Rachit Bansal
Danish Pruthi
Yonatan Belinkov
25
8
0
17 Oct 2022
Measuring Causal Effects of Data Statistics on Language Model's
  `Factual' Predictions
Measuring Causal Effects of Data Statistics on Language Model's `Factual' Predictions
Yanai Elazar
Nora Kassner
Shauli Ravfogel
Amir Feder
Abhilasha Ravichander
Marius Mosbach
Yonatan Belinkov
Hinrich Schütze
Yoav Goldberg
CML
SyDa
MILM
23
52
0
28 Jul 2022
Toward Transparent AI: A Survey on Interpreting the Inner Structures of
  Deep Neural Networks
Toward Transparent AI: A Survey on Interpreting the Inner Structures of Deep Neural Networks
Tilman Raukur
A. Ho
Stephen Casper
Dylan Hadfield-Menell
AAML
AI4CE
18
124
0
27 Jul 2022
Discovering Salient Neurons in Deep NLP Models
Discovering Salient Neurons in Deep NLP Models
Nadir Durrani
Fahim Dalvi
Hassan Sajjad
KELM
MILM
14
15
0
27 Jun 2022
IDANI: Inference-time Domain Adaptation via Neuron-level Interventions
IDANI: Inference-time Domain Adaptation via Neuron-level Interventions
Omer Antverg
Eyal Ben-David
Yonatan Belinkov
OOD
AI4CE
11
5
0
01 Jun 2022
Same Neurons, Different Languages: Probing Morphosyntax in Multilingual
  Pre-trained Models
Same Neurons, Different Languages: Probing Morphosyntax in Multilingual Pre-trained Models
Karolina Stañczak
E. Ponti
Lucas Torroba Hennigen
Ryan Cotterell
Isabelle Augenstein
MILM
LRM
22
10
0
04 May 2022
Analyzing Gender Representation in Multilingual Models
Analyzing Gender Representation in Multilingual Models
Hila Gonen
Shauli Ravfogel
Yoav Goldberg
10
11
0
20 Apr 2022
A Latent-Variable Model for Intrinsic Probing
A Latent-Variable Model for Intrinsic Probing
Karolina Stañczak
Lucas Torroba Hennigen
Adina Williams
Ryan Cotterell
Isabelle Augenstein
21
4
0
20 Jan 2022
Sparse Interventions in Language Models with Differentiable Masking
Sparse Interventions in Language Models with Differentiable Masking
Nicola De Cao
Leon Schmid
Dieuwke Hupkes
Ivan Titov
25
27
0
13 Dec 2021
On Neurons Invariant to Sentence Structural Changes in Neural Machine
  Translation
On Neurons Invariant to Sentence Structural Changes in Neural Machine Translation
Gal Patel
Leshem Choshen
Omri Abend
31
2
0
06 Oct 2021
Neuron-level Interpretation of Deep NLP Models: A Survey
Neuron-level Interpretation of Deep NLP Models: A Survey
Hassan Sajjad
Nadir Durrani
Fahim Dalvi
MILM
AI4CE
22
79
0
30 Aug 2021
What you can cram into a single vector: Probing sentence embeddings for
  linguistic properties
What you can cram into a single vector: Probing sentence embeddings for linguistic properties
Alexis Conneau
Germán Kruszewski
Guillaume Lample
Loïc Barrault
Marco Baroni
199
879
0
03 May 2018
1