ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2108.13138
  4. Cited By
Neuron-level Interpretation of Deep NLP Models: A Survey

Neuron-level Interpretation of Deep NLP Models: A Survey

30 August 2021
Hassan Sajjad
Nadir Durrani
Fahim Dalvi
    MILM
    AI4CE
ArXivPDFHTML

Papers citing "Neuron-level Interpretation of Deep NLP Models: A Survey"

50 / 65 papers shown
Title
Disentangling Linguistic Features with Dimension-Wise Analysis of Vector Embeddings
Disentangling Linguistic Features with Dimension-Wise Analysis of Vector Embeddings
Saniya Karwa
Navpreet Singh
CoGe
34
0
0
20 Apr 2025
Neuron-level Balance between Stability and Plasticity in Deep Reinforcement Learning
Neuron-level Balance between Stability and Plasticity in Deep Reinforcement Learning
Jiahua Lan
Sen Zhang
Haixia Pan
Ruijun Liu
Li Shen
Dacheng Tao
CLL
23
0
0
09 Apr 2025
Following the Whispers of Values: Unraveling Neural Mechanisms Behind Value-Oriented Behaviors in LLMs
Following the Whispers of Values: Unraveling Neural Mechanisms Behind Value-Oriented Behaviors in LLMs
Ling Hu
Yuemei Xu
Xiaoyang Gu
Letao Han
28
0
0
07 Apr 2025
Everything, Everywhere, All at Once: Is Mechanistic Interpretability Identifiable?
Everything, Everywhere, All at Once: Is Mechanistic Interpretability Identifiable?
Maxime Méloux
Silviu Maniu
François Portet
Maxime Peyrard
34
0
0
28 Feb 2025
Sparse Auto-Encoder Interprets Linguistic Features in Large Language Models
Sparse Auto-Encoder Interprets Linguistic Features in Large Language Models
Yi Jing
Zijun Yao
Lingxu Ran
Hongzhu Guo
Xiaozhi Wang
Lei Hou
Juanzi Li
52
0
0
27 Feb 2025
On Relation-Specific Neurons in Large Language Models
On Relation-Specific Neurons in Large Language Models
Yihong Liu
Runsheng Chen
Lea Hirlimann
Ahmad Dawar Hakimi
Mingyang Wang
Amir Hossein Kargaran
S. Rothe
François Yvon
Hinrich Schütze
KELM
33
0
0
24 Feb 2025
Explainable and Interpretable Multimodal Large Language Models: A
  Comprehensive Survey
Explainable and Interpretable Multimodal Large Language Models: A Comprehensive Survey
Yunkai Dang
Kaichen Huang
Jiahao Huo
Yibo Yan
S. Huang
...
Kun Wang
Yong Liu
Jing Shao
Hui Xiong
Xuming Hu
LRM
96
14
0
03 Dec 2024
Understanding Multimodal LLMs: the Mechanistic Interpretability of Llava in Visual Question Answering
Zeping Yu
Sophia Ananiadou
55
0
0
17 Nov 2024
Sparsing Law: Towards Large Language Models with Greater Activation Sparsity
Sparsing Law: Towards Large Language Models with Greater Activation Sparsity
Yuqi Luo
Chenyang Song
Xu Han
Y. Chen
Chaojun Xiao
Zhiyuan Liu
Maosong Sun
47
3
0
04 Nov 2024
The Same But Different: Structural Similarities and Differences in
  Multilingual Language Modeling
The Same But Different: Structural Similarities and Differences in Multilingual Language Modeling
Ruochen Zhang
Qinan Yu
Matianyu Zang
Carsten Eickhoff
Ellie Pavlick
43
1
0
11 Oct 2024
Mechanistic?
Mechanistic?
Naomi Saphra
Sarah Wiegreffe
AI4CE
21
9
0
07 Oct 2024
Investigating OCR-Sensitive Neurons to Improve Entity Recognition in
  Historical Documents
Investigating OCR-Sensitive Neurons to Improve Entity Recognition in Historical Documents
Emanuela Boros
Maud Ehrmann
31
0
0
25 Sep 2024
The Quest for the Right Mediator: A History, Survey, and Theoretical
  Grounding of Causal Interpretability
The Quest for the Right Mediator: A History, Survey, and Theoretical Grounding of Causal Interpretability
Aaron Mueller
Jannik Brinkmann
Millicent Li
Samuel Marks
Koyena Pal
...
Arnab Sen Sharma
Jiuding Sun
Eric Todd
David Bau
Yonatan Belinkov
CML
40
18
0
02 Aug 2024
LLM Circuit Analyses Are Consistent Across Training and Scale
LLM Circuit Analyses Are Consistent Across Training and Scale
Curt Tigges
Michael Hanna
Qinan Yu
Stella Biderman
31
10
0
15 Jul 2024
Towards Understanding Multi-Task Learning (Generalization) of LLMs via Detecting and Exploring Task-Specific Neurons
Towards Understanding Multi-Task Learning (Generalization) of LLMs via Detecting and Exploring Task-Specific Neurons
Yongqi Leng
Deyi Xiong
32
5
0
09 Jul 2024
Large Language Models Are Cross-Lingual Knowledge-Free Reasoners
Large Language Models Are Cross-Lingual Knowledge-Free Reasoners
Peng Hu
Sizhe Liu
Changjiang Gao
Xin Huang
Xue Han
Junlan Feng
Chao Deng
Shujian Huang
LRM
31
1
0
24 Jun 2024
MMNeuron: Discovering Neuron-Level Domain-Specific Interpretation in
  Multimodal Large Language Model
MMNeuron: Discovering Neuron-Level Domain-Specific Interpretation in Multimodal Large Language Model
Jiahao Huo
Yibo Yan
Boren Hu
Yutao Yue
Xuming Hu
LRM
MLLM
32
7
0
17 Jun 2024
No Two Devils Alike: Unveiling Distinct Mechanisms of Fine-tuning
  Attacks
No Two Devils Alike: Unveiling Distinct Mechanisms of Fine-tuning Attacks
Chak Tou Leong
Yi Cheng
Kaishuai Xu
Jian Wang
Hanlin Wang
Wenjie Li
AAML
41
17
0
25 May 2024
BadActs: A Universal Backdoor Defense in the Activation Space
BadActs: A Universal Backdoor Defense in the Activation Space
Biao Yi
Sishuo Chen
Yiming Li
Tong Li
Baolei Zhang
Zheli Liu
AAML
22
5
0
18 May 2024
Challenges and Opportunities in Text Generation Explainability
Challenges and Opportunities in Text Generation Explainability
Kenza Amara
R. Sevastjanova
Mennatallah El-Assady
SILM
20
2
0
14 May 2024
Linear Explanations for Individual Neurons
Linear Explanations for Individual Neurons
Tuomas P. Oikarinen
Tsui-Wei Weng
FAtt
MILM
29
5
0
10 May 2024
Revealing the Parametric Knowledge of Language Models: A Unified
  Framework for Attribution Methods
Revealing the Parametric Knowledge of Language Models: A Unified Framework for Attribution Methods
Haeun Yu
Pepa Atanasova
Isabelle Augenstein
KELM
29
4
0
29 Apr 2024
Detecting Conceptual Abstraction in LLMs
Detecting Conceptual Abstraction in LLMs
Michaela Regneri
Alhassan Abdelhalim
Soren Laue
25
1
0
24 Apr 2024
Mechanistic Interpretability for AI Safety -- A Review
Mechanistic Interpretability for AI Safety -- A Review
Leonard Bereska
E. Gavves
AI4CE
38
111
0
22 Apr 2024
Language-Specific Neurons: The Key to Multilingual Capabilities in Large
  Language Models
Language-Specific Neurons: The Key to Multilingual Capabilities in Large Language Models
Tianyi Tang
Wenyang Luo
Haoyang Huang
Dongdong Zhang
Xiaolei Wang
Xin Zhao
Furu Wei
Ji-Rong Wen
46
46
0
26 Feb 2024
Towards Generating Informative Textual Description for Neurons in
  Language Models
Towards Generating Informative Textual Description for Neurons in Language Models
Shrayani Mondal
Rishabh Garodia
Arbaaz Qureshi
Taesung Lee
Youngja Park
MILM
19
0
0
30 Jan 2024
CascadedGaze: Efficiency in Global Context Extraction for Image
  Restoration
CascadedGaze: Efficiency in Global Context Extraction for Image Restoration
Amirhosein Ghasemabadi
Muhammad Kamran Janjua
Mohammad Salameh
Chunhua Zhou
Fengyu Sun
Di Niu
17
11
0
26 Jan 2024
Understanding Distributed Representations of Concepts in Deep Neural
  Networks without Supervision
Understanding Distributed Representations of Concepts in Deep Neural Networks without Supervision
Wonjoon Chang
Dahee Kwon
Jaesik Choi
11
1
0
28 Dec 2023
XplainLLM: A QA Explanation Dataset for Understanding LLM
  Decision-Making
XplainLLM: A QA Explanation Dataset for Understanding LLM Decision-Making
Zichen Chen
Jianda Chen
Mitali Gaidhani
Ambuj K. Singh
Misha Sra
18
4
0
15 Nov 2023
Investigating the Encoding of Words in BERT's Neurons using Feature
  Textualization
Investigating the Encoding of Words in BERT's Neurons using Feature Textualization
Tanja Baeumel
Soniya Vijayakumar
Josef van Genabith
Guenter Neumann
Simon Ostermann
MILM
21
3
0
14 Nov 2023
Multilingual Nonce Dependency Treebanks: Understanding how Language
  Models represent and process syntactic structure
Multilingual Nonce Dependency Treebanks: Understanding how Language Models represent and process syntactic structure
David Arps
Laura Kallmeyer
Younes Samih
Hassan Sajjad
11
0
0
13 Nov 2023
Towards a fuller understanding of neurons with Clustered Compositional
  Explanations
Towards a fuller understanding of neurons with Clustered Compositional Explanations
Biagio La Rosa
Leilani H. Gilpin
Roberto Capobianco
17
2
0
27 Oct 2023
Cross-Modal Conceptualization in Bottleneck Models
Cross-Modal Conceptualization in Bottleneck Models
Danis Alukaev
S. Kiselev
Ilya S. Pershin
Bulat Ibragimov
Vladimir Ivanov
Alexey Kornaev
Ivan Titov
23
6
0
23 Oct 2023
From Neural Activations to Concepts: A Survey on Explaining Concepts in
  Neural Networks
From Neural Activations to Concepts: A Survey on Explaining Concepts in Neural Networks
Jae Hee Lee
Sergio Lanza
Stefan Wermter
11
8
0
18 Oct 2023
The Geometry of Truth: Emergent Linear Structure in Large Language Model
  Representations of True/False Datasets
The Geometry of Truth: Emergent Linear Structure in Large Language Model Representations of True/False Datasets
Samuel Marks
Max Tegmark
HILM
91
164
0
10 Oct 2023
The Importance of Prompt Tuning for Automated Neuron Explanations
The Importance of Prompt Tuning for Automated Neuron Explanations
Justin Lee
Tuomas P. Oikarinen
Arjun Chatha
Keng-Chi Chang
Yilan Chen
Tsui-Wei Weng
LRM
17
5
0
09 Oct 2023
Scaling up Discovery of Latent Concepts in Deep NLP Models
Scaling up Discovery of Latent Concepts in Deep NLP Models
Majd Hawasly
Fahim Dalvi
Nadir Durrani
29
4
0
20 Aug 2023
Towards Explainable Evaluation Metrics for Machine Translation
Towards Explainable Evaluation Metrics for Machine Translation
Christoph Leiter
Piyawat Lertvittayakumjorn
M. Fomicheva
Wei-Ye Zhao
Yang Gao
Steffen Eger
ELM
21
11
0
22 Jun 2023
Wave to Syntax: Probing spoken language models for syntax
Wave to Syntax: Probing spoken language models for syntax
G. Shen
A. Alishahi
Arianna Bisazza
Grzegorz Chrupała
19
10
0
30 May 2023
Emergent Modularity in Pre-trained Transformers
Emergent Modularity in Pre-trained Transformers
Zhengyan Zhang
Zhiyuan Zeng
Yankai Lin
Chaojun Xiao
Xiaozhi Wang
Xu Han
Zhiyuan Liu
Ruobing Xie
Maosong Sun
Jie Zhou
MoE
37
23
0
28 May 2023
NeuroX Library for Neuron Analysis of Deep NLP Models
NeuroX Library for Neuron Analysis of Deep NLP Models
Fahim Dalvi
Hassan Sajjad
Nadir Durrani
17
9
0
26 May 2023
Can LLMs facilitate interpretation of pre-trained language models?
Can LLMs facilitate interpretation of pre-trained language models?
Basel Mousi
Nadir Durrani
Fahim Dalvi
36
12
0
22 May 2023
Finding Neurons in a Haystack: Case Studies with Sparse Probing
Finding Neurons in a Haystack: Case Studies with Sparse Probing
Wes Gurnee
Neel Nanda
Matthew Pauly
Katherine Harvey
Dmitrii Troitskii
Dimitris Bertsimas
MILM
153
186
0
02 May 2023
How does GPT-2 compute greater-than?: Interpreting mathematical
  abilities in a pre-trained language model
How does GPT-2 compute greater-than?: Interpreting mathematical abilities in a pre-trained language model
Michael Hanna
Ollie Liu
Alexandre Variengien
LRM
184
116
0
30 Apr 2023
NxPlain: Web-based Tool for Discovery of Latent Concepts
NxPlain: Web-based Tool for Discovery of Latent Concepts
Fahim Dalvi
Nadir Durrani
Hassan Sajjad
Tamim Jaban
Musab Husaini
Ummar Abbas
13
1
0
06 Mar 2023
Evaluating Neuron Interpretation Methods of NLP Models
Evaluating Neuron Interpretation Methods of NLP Models
Yimin Fan
Fahim Dalvi
Nadir Durrani
Hassan Sajjad
32
7
0
30 Jan 2023
Interpretability in Activation Space Analysis of Transformers: A Focused
  Survey
Interpretability in Activation Space Analysis of Transformers: A Focused Survey
Soniya Vijayakumar
AI4CE
20
3
0
22 Jan 2023
ConceptX: A Framework for Latent Concept Analysis
ConceptX: A Framework for Latent Concept Analysis
Firoj Alam
Fahim Dalvi
Nadir Durrani
Hassan Sajjad
A. Khan
Jia Xu
17
5
0
12 Nov 2022
Impact of Adversarial Training on Robustness and Generalizability of
  Language Models
Impact of Adversarial Training on Robustness and Generalizability of Language Models
Enes Altinisik
Hassan Sajjad
H. Sencar
Safa Messaoud
Sanjay Chawla
AAML
8
8
0
10 Nov 2022
On the Transformation of Latent Space in Fine-Tuned NLP Models
On the Transformation of Latent Space in Fine-Tuned NLP Models
Nadir Durrani
Hassan Sajjad
Fahim Dalvi
Firoj Alam
27
17
0
23 Oct 2022
12
Next