ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2309.05973
  4. Cited By
Circuit Breaking: Removing Model Behaviors with Targeted Ablation
v1v2 (latest)

Circuit Breaking: Removing Model Behaviors with Targeted Ablation

12 September 2023
Maximilian Li
Xander Davies
Max Nadeau
    KELMMU
ArXiv (abs)PDFHTML

Papers citing "Circuit Breaking: Removing Model Behaviors with Targeted Ablation"

13 / 13 papers shown
Title
Grounding or Guessing? Visual Signals for Detecting Hallucinations in Sign Language Translation
Grounding or Guessing? Visual Signals for Detecting Hallucinations in Sign Language Translation
Yasser Hamidullah
Koel Dutta Chowdury
Yusser Al Ghussin
Shakib Yazdani
Cennet Oguz
Josef van Genabith
C. España-Bonet
133
0
0
21 Oct 2025
Interpretability as Alignment: Making Internal Understanding a Design Principle
Interpretability as Alignment: Making Internal Understanding a Design Principle
Aadit Sengupta
Pratinav Seth
Vinay Kumar Sankarapu
AI4CEAAML
121
0
0
10 Sep 2025
IF-GUIDE: Influence Function-Guided Detoxification of LLMs
IF-GUIDE: Influence Function-Guided Detoxification of LLMs
Zachary Coalson
Juhan Bae
Nicholas Carlini
Sanghyun Hong
TDI
349
1
0
02 Jun 2025
Promote, Suppress, Iterate: How Language Models Answer One-to-Many Factual Queries
Promote, Suppress, Iterate: How Language Models Answer One-to-Many Factual Queries
Tianyi Lorena Yan
Robin Jia
KELMMU
276
0
0
27 Feb 2025
Missed Causes and Ambiguous Effects: Counterfactuals Pose Challenges for
  Interpreting Neural Networks
Missed Causes and Ambiguous Effects: Counterfactuals Pose Challenges for Interpreting Neural Networks
Aaron Mueller
CML
192
16
0
05 Jul 2024
Sheaf Discovery with Joint Computation Graph Pruning and Flexible Granularity
Sheaf Discovery with Joint Computation Graph Pruning and Flexible Granularity
Lei Yu
Jingcheng Niu
Zining Zhu
Xi Chen
Gerald Penn
175
9
0
04 Jul 2024
Knowledge Circuits in Pretrained Transformers
Knowledge Circuits in Pretrained Transformers
Yunzhi Yao
Ningyu Zhang
Zekun Xi
Meng Wang
Ziwen Xu
Shumin Deng
Huajun Chen
KELM
349
41
0
28 May 2024
Towards Principled Evaluations of Sparse Autoencoders for
  Interpretability and Control
Towards Principled Evaluations of Sparse Autoencoders for Interpretability and Control
Aleksandar Makelov
Georg Lange
Neel Nanda
315
58
0
14 May 2024
Decomposing and Editing Predictions by Modeling Model Computation
Decomposing and Editing Predictions by Modeling Model Computation
Harshay Shah
Andrew Ilyas
Aleksander Madry
KELM
258
23
0
17 Apr 2024
pyvene: A Library for Understanding and Improving PyTorch Models via
  Interventions
pyvene: A Library for Understanding and Improving PyTorch Models via InterventionsNorth American Chapter of the Association for Computational Linguistics (NAACL), 2024
Zhengxuan Wu
Atticus Geiger
Aryaman Arora
Jing-ling Huang
Zheng Wang
Noah D. Goodman
Christopher D. Manning
Christopher Potts
MU
209
43
0
12 Mar 2024
SoK: Memorization in General-Purpose Large Language Models
SoK: Memorization in General-Purpose Large Language Models
Valentin Hartmann
Anshuman Suri
Vincent Bindschaedler
David Evans
Shruti Tople
Robert West
KELMLLMAG
288
35
0
24 Oct 2023
NeuroSurgeon: A Toolkit for Subnetwork Analysis
NeuroSurgeon: A Toolkit for Subnetwork Analysis
Michael A. Lepori
Ellie Pavlick
Thomas Serre
168
8
0
01 Sep 2023
A Unified Approach to Interpreting Model Predictions
A Unified Approach to Interpreting Model Predictions
Scott M. Lundberg
Su-In Lee
FAtt
2.7K
28,515
0
22 May 2017
1