ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2104.03514
  4. Cited By
Low-Complexity Probing via Finding Subnetworks

Low-Complexity Probing via Finding Subnetworks

North American Chapter of the Association for Computational Linguistics (NAACL), 2021
8 April 2021
Steven Cao
Victor Sanh
Alexander M. Rush
ArXiv (abs)PDFHTML

Papers citing "Low-Complexity Probing via Finding Subnetworks"

50 / 53 papers shown
Weight-sparse transformers have interpretable circuits
Weight-sparse transformers have interpretable circuits
Leo Gao
Achyuta Rajaram
Jacob Coxon
Soham V. Govande
Bowen Baker
Dan Mossing
MILM
237
7
0
17 Nov 2025
PAHQ: Accelerating Automated Circuit Discovery through Mixed-Precision Inference Optimization
PAHQ: Accelerating Automated Circuit Discovery through Mixed-Precision Inference Optimization
Xinhai Wang
Shu Yang
Liangyu Wang
L. Zhang
Huanyi Xie
Lijie Hu
Di Wang
200
2
0
27 Oct 2025
C-SWAP: Explainability-Aware Structured Pruning for Efficient Neural Networks Compression
C-SWAP: Explainability-Aware Structured Pruning for Efficient Neural Networks Compression
Baptiste Bauvin
Loïc Baret
Ola Ahmad
132
0
0
21 Oct 2025
Discovering Transformer Circuits via a Hybrid Attribution and Pruning Framework
Discovering Transformer Circuits via a Hybrid Attribution and Pruning Framework
Hao Gu
Vibhas Nair
Amrithaa Ashok Kumar
Jayvart Sharma
Ryan Lagasse
99
1
0
28 Sep 2025
Towards Transparent AI: A Survey on Explainable Language Models
Towards Transparent AI: A Survey on Explainable Language Models
Avash Palikhe
Sribala Vidyadhari Chinta
Zhipeng Yin
Rui Guo
Qiang Duan
Jie Yang
Wenbin Zhang
185
2
0
25 Sep 2025
From Indirect Object Identification to Syllogisms: Exploring Binary Mechanisms in Transformer Circuits
From Indirect Object Identification to Syllogisms: Exploring Binary Mechanisms in Transformer Circuits
Jiaqi W. Ma
Shichang Zhang
119
0
0
22 Aug 2025
On the Performance of Concept Probing: The Influence of the Data (Extended Version)
On the Performance of Concept Probing: The Influence of the Data (Extended Version)
Manuel de Sousa Ribeiro
Afonso Leote
João Leite
197
1
0
24 Jul 2025
Concept Probing: Where to Find Human-Defined Concepts (Extended Version)
Concept Probing: Where to Find Human-Defined Concepts (Extended Version)
Manuel de Sousa Ribeiro
Afonso Leote
João Leite
189
1
0
24 Jul 2025
What Has a Foundation Model Found? Using Inductive Bias to Probe for World Models
What Has a Foundation Model Found? Using Inductive Bias to Probe for World Models
Keyon Vafa
Peter G. Chang
Ashesh Rambachan
S. Mullainathan
639
16
0
09 Jul 2025
Stochastic Parameter Decomposition
Stochastic Parameter Decomposition
Lucius Bushnaq
Dan Braun
Lee D. Sharkey
221
8
0
25 Jun 2025
Line of Sight: On Linear Representations in VLLMs
Achyuta Rajaram
Sarah Schwettmann
Jacob Andreas
Arthur Conmy
VLM
315
2
0
05 Jun 2025
Analyzing the Inner Workings of Transformers in Compositional Generalization
Analyzing the Inner Workings of Transformers in Compositional GeneralizationNorth American Chapter of the Association for Computational Linguistics (NAACL), 2025
Ryoma Kumon
Hitomi Yanaka
327
1
0
24 Feb 2025
Circuit Compositions: Exploring Modular Structures in Transformer-Based Language Models
Circuit Compositions: Exploring Modular Structures in Transformer-Based Language ModelsAnnual Meeting of the Association for Computational Linguistics (ACL), 2024
Philipp Mondorf
Sondre Wold
Yun Xue
501
2
0
02 Oct 2024
Optimal ablation for interpretability
Optimal ablation for interpretabilityNeural Information Processing Systems (NeurIPS), 2024
Maximilian Li
Lucas Janson
FAtt
343
12
0
16 Sep 2024
Explaining Human Comparisons using Alignment-Importance Heatmaps
Explaining Human Comparisons using Alignment-Importance Heatmaps
Nhut Truong
Dario Pesenti
Uri Hasson
183
1
0
08 Sep 2024
The Quest for the Right Mediator: Surveying Mechanistic Interpretability Through the Lens of Causal Mediation Analysis
The Quest for the Right Mediator: Surveying Mechanistic Interpretability Through the Lens of Causal Mediation AnalysisComputational Linguistics (CL), 2024
Aaron Mueller
Jannik Brinkmann
Millicent Li
Samuel Marks
Koyena Pal
...
Arnab Sen Sharma
Jiuding Sun
Eric Todd
David Bau
Yonatan Belinkov
CML
511
34
0
02 Aug 2024
Tracking linguistic information in transformer-based sentence embeddings
  through targeted sparsification
Tracking linguistic information in transformer-based sentence embeddings through targeted sparsification
Vivi Nastase
Paola Merlo
199
6
0
25 Jul 2024
Investigating the Indirect Object Identification circuit in Mamba
Investigating the Indirect Object Identification circuit in Mamba
Danielle Ensign
Adrià Garriga-Alonso
Mamba
170
0
0
19 Jul 2024
InterpBench: Semi-Synthetic Transformers for Evaluating Mechanistic Interpretability Techniques
InterpBench: Semi-Synthetic Transformers for Evaluating Mechanistic Interpretability Techniques
Rohan Gupta
Iván Arcuschin
Thomas Kwa
Adrià Garriga-Alonso
351
6
0
19 Jul 2024
Sheaf Discovery with Joint Computation Graph Pruning and Flexible Granularity
Sheaf Discovery with Joint Computation Graph Pruning and Flexible Granularity
Lei Yu
Jingcheng Niu
Zining Zhu
Xi Chen
Gerald Penn
221
9
0
04 Jul 2024
Are there identifiable structural parts in the sentence embedding whole?
Are there identifiable structural parts in the sentence embedding whole?
Vivi Nastase
Paola Merlo
200
5
0
24 Jun 2024
Finding Transformer Circuits with Edge Pruning
Finding Transformer Circuits with Edge Pruning
Adithya Bhaskar
Alexander Wettig
Dan Friedman
Danqi Chen
471
36
0
24 Jun 2024
Sparse Autoencoders Enable Scalable and Reliable Circuit Identification
  in Language Models
Sparse Autoencoders Enable Scalable and Reliable Circuit Identification in Language Models
Charles OÑeill
Thang Bui
209
12
0
21 May 2024
Automatic Discovery of Visual Circuits
Automatic Discovery of Visual Circuits
Achyuta Rajaram
Neil Chowdhury
Antonio Torralba
Jacob Andreas
Sarah Schwettmann
GNN
186
7
0
22 Apr 2024
Mechanistic Interpretability for AI Safety -- A Review
Mechanistic Interpretability for AI Safety -- A Review
Leonard Bereska
E. Gavves
AI4CE
383
301
0
22 Apr 2024
Decomposing and Editing Predictions by Modeling Model Computation
Decomposing and Editing Predictions by Modeling Model Computation
Harshay Shah
Andrew Ilyas
Aleksander Madry
KELM
296
24
0
17 Apr 2024
Embedded Named Entity Recognition using Probing Classifiers
Embedded Named Entity Recognition using Probing ClassifiersConference on Empirical Methods in Natural Language Processing (EMNLP), 2024
Nicholas Popovic
Michael Färber
234
3
0
18 Mar 2024
Assessing the Brittleness of Safety Alignment via Pruning and Low-Rank
  Modifications
Assessing the Brittleness of Safety Alignment via Pruning and Low-Rank Modifications
Boyi Wei
Kaixuan Huang
Yangsibo Huang
Tinghao Xie
Xiangyu Qi
Mengzhou Xia
Prateek Mittal
Mengdi Wang
Peter Henderson
AAML
331
174
0
07 Feb 2024
Do Localization Methods Actually Localize Memorized Data in LLMs? A Tale
  of Two Benchmarks
Do Localization Methods Actually Localize Memorized Data in LLMs? A Tale of Two BenchmarksNorth American Chapter of the Association for Computational Linguistics (NAACL), 2023
Ting-Yun Chang
Jesse Thomason
Robin Jia
323
26
0
15 Nov 2023
Uncovering Intermediate Variables in Transformers using Circuit Probing
Uncovering Intermediate Variables in Transformers using Circuit Probing
Michael A. Lepori
Thomas Serre
Ellie Pavlick
399
11
0
07 Nov 2023
Attribution Patching Outperforms Automated Circuit Discovery
Attribution Patching Outperforms Automated Circuit DiscoveryBlackboxNLP Workshop on Analyzing and Interpreting Neural Networks for NLP (BlackboxNLP), 2023
Aaquib Syed
Can Rager
Arthur Conmy
369
102
0
16 Oct 2023
SPADE: Sparsity-Guided Debugging for Deep Neural Networks
SPADE: Sparsity-Guided Debugging for Deep Neural NetworksInternational Conference on Machine Learning (ICML), 2023
Arshia Soltani Moakhar
Eugenia Iofinova
Elias Frantar
Dan Alistarh
332
2
0
06 Oct 2023
Discovering Knowledge-Critical Subnetworks in Pretrained Language Models
Discovering Knowledge-Critical Subnetworks in Pretrained Language ModelsConference on Empirical Methods in Natural Language Processing (EMNLP), 2023
Deniz Bayazit
Negar Foroutan
Zeming Chen
Gail Weiss
Antoine Bosselut
KELM
261
19
0
04 Oct 2023
Circuit Breaking: Removing Model Behaviors with Targeted Ablation
Circuit Breaking: Removing Model Behaviors with Targeted Ablation
Maximilian Li
Xander Davies
Max Nadeau
KELMMU
306
34
0
12 Sep 2023
NeuroSurgeon: A Toolkit for Subnetwork Analysis
NeuroSurgeon: A Toolkit for Subnetwork Analysis
Michael A. Lepori
Ellie Pavlick
Thomas Serre
202
9
0
01 Sep 2023
Does Circuit Analysis Interpretability Scale? Evidence from Multiple
  Choice Capabilities in Chinchilla
Does Circuit Analysis Interpretability Scale? Evidence from Multiple Choice Capabilities in Chinchilla
Tom Lieberum
Matthew Rahtz
János Kramár
Neel Nanda
G. Irving
Rohin Shah
Vladimir Mikulik
323
141
0
18 Jul 2023
Finding Neurons in a Haystack: Case Studies with Sparse Probing
Finding Neurons in a Haystack: Case Studies with Sparse Probing
Wes Gurnee
Neel Nanda
Matthew Pauly
Katherine Harvey
Dmitrii Troitskii
Dimitris Bertsimas
MILM
540
291
0
02 May 2023
Towards Automated Circuit Discovery for Mechanistic Interpretability
Towards Automated Circuit Discovery for Mechanistic InterpretabilityNeural Information Processing Systems (NeurIPS), 2023
Arthur Conmy
Augustine N. Mavor-Parker
Aengus Lynch
Stefan Heimersheim
Adrià Garriga-Alonso
540
452
0
28 Apr 2023
Break It Down: Evidence for Structural Compositionality in Neural
  Networks
Break It Down: Evidence for Structural Compositionality in Neural NetworksNeural Information Processing Systems (NeurIPS), 2023
Michael A. Lepori
Thomas Serre
Ellie Pavlick
335
52
0
26 Jan 2023
CREPE: Can Vision-Language Foundation Models Reason Compositionally?
CREPE: Can Vision-Language Foundation Models Reason Compositionally?Computer Vision and Pattern Recognition (CVPR), 2022
Zixian Ma
Jerry Hong
Mustafa Omer Gul
Mona Gandhi
Irena Gao
Ranjay Krishna
CoGe
376
183
0
13 Dec 2022
The Architectural Bottleneck Principle
The Architectural Bottleneck PrincipleConference on Empirical Methods in Natural Language Processing (EMNLP), 2022
Tiago Pimentel
Josef Valvoda
Niklas Stoehr
Robert Bamler
190
5
0
11 Nov 2022
SocioProbe: What, When, and Where Language Models Learn about
  Sociodemographics
SocioProbe: What, When, and Where Language Models Learn about SociodemographicsConference on Empirical Methods in Natural Language Processing (EMNLP), 2022
Anne Lauscher
Federico Bianchi
Samuel R. Bowman
Dirk Hovy
221
10
0
08 Nov 2022
Emergent World Representations: Exploring a Sequence Model Trained on a
  Synthetic Task
Emergent World Representations: Exploring a Sequence Model Trained on a Synthetic TaskInternational Conference on Learning Representations (ICLR), 2022
Kenneth Li
Aspen K. Hopkins
David Bau
Fernanda Viégas
Hanspeter Pfister
Martin Wattenberg
MILM
601
386
0
24 Oct 2022
The Open-World Lottery Ticket Hypothesis for OOD Intent Classification
The Open-World Lottery Ticket Hypothesis for OOD Intent ClassificationInternational Conference on Language Resources and Evaluation (LREC), 2022
Yunhua Zhou
Pengyu Wang
Peiju Liu
Yuxin Wang
Xipeng Qiu
333
2
0
13 Oct 2022
Probing via Prompting
Probing via PromptingNorth American Chapter of the Association for Computational Linguistics (NAACL), 2022
Jiaoda Li
Robert Bamler
Mrinmaya Sachan
261
14
0
04 Jul 2022
Diverse Lottery Tickets Boost Ensemble from a Single Pretrained Model
Diverse Lottery Tickets Boost Ensemble from a Single Pretrained Model
Sosuke Kobayashi
Shun Kiyono
Jun Suzuki
Kentaro Inui
MoMe
225
10
0
24 May 2022
Visualizing the Relationship Between Encoded Linguistic Information and
  Task Performance
Visualizing the Relationship Between Encoded Linguistic Information and Task PerformanceFindings (Findings), 2022
Jiannan Xiang
Huayang Li
Defu Lian
Guoping Huang
Taro Watanabe
Lemao Liu
142
1
0
29 Mar 2022
The Dangers of Underclaiming: Reasons for Caution When Reporting How NLP
  Systems Fail
The Dangers of Underclaiming: Reasons for Caution When Reporting How NLP Systems Fail
Sam Bowman
OffRL
373
48
0
15 Oct 2021
Conditional probing: measuring usable information beyond a baseline
Conditional probing: measuring usable information beyond a baseline
John Hewitt
Kawin Ethayarajh
Abigail Z. Jacobs
Christopher D. Manning
208
63
0
19 Sep 2021
How Does Adversarial Fine-Tuning Benefit BERT?
How Does Adversarial Fine-Tuning Benefit BERT?
J. Ebrahimi
Hao Yang
Wei Zhang
AAML
254
6
0
31 Aug 2021
12
Next
Page 1 of 2