ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2405.10927
  4. Cited By
Using Degeneracy in the Loss Landscape for Mechanistic Interpretability

Using Degeneracy in the Loss Landscape for Mechanistic Interpretability

17 May 2024
Lucius Bushnaq
Jake Mendel
Stefan Heimersheim
Dan Braun
Nicholas Goldowsky-Dill
Kaarel Hänni
Cindy Wu
Marius Hobbhahn
ArXivPDFHTML

Papers citing "Using Degeneracy in the Loss Landscape for Mechanistic Interpretability"

7 / 7 papers shown
Title
Identifying Sparsely Active Circuits Through Local Loss Landscape Decomposition
Identifying Sparsely Active Circuits Through Local Loss Landscape Decomposition
Brianna Chrisman
Lucius Bushnaq
Lee D. Sharkey
39
0
0
31 Mar 2025
NGD converges to less degenerate solutions than SGD
NGD converges to less degenerate solutions than SGD
Moosa Saghir
N. R. Raghavendra
Zihe Liu
Evan Ryan Gunter
25
0
0
07 Sep 2024
Cluster-norm for Unsupervised Probing of Knowledge
Cluster-norm for Unsupervised Probing of Knowledge
Walter Laurito
Sharan Maiya
Grégoire Dhimoïla
Owen
Owen Yeung
Kaarel Hänni
27
2
0
26 Jul 2024
Weight-based Decomposition: A Case for Bilinear MLPs
Weight-based Decomposition: A Case for Bilinear MLPs
Michael T. Pearce
Thomas Dooms
Alice Rigg
42
1
0
06 Jun 2024
Review and Prospect of Algebraic Research in Equivalent Framework between Statistical Mechanics and Machine Learning Theory
Review and Prospect of Algebraic Research in Equivalent Framework between Statistical Mechanics and Machine Learning Theory
Sumio Watanabe
25
1
0
31 May 2024
The Local Interaction Basis: Identifying Computationally-Relevant and
  Sparsely Interacting Features in Neural Networks
The Local Interaction Basis: Identifying Computationally-Relevant and Sparsely Interacting Features in Neural Networks
Lucius Bushnaq
Stefan Heimersheim
Nicholas Goldowsky-Dill
Dan Braun
Jake Mendel
Kaarel Hänni
Avery Griffin
Jörn Stöhler
Magdalena Wache
Marius Hobbhahn
FAtt
33
3
0
17 May 2024
Interpretability in the Wild: a Circuit for Indirect Object
  Identification in GPT-2 small
Interpretability in the Wild: a Circuit for Indirect Object Identification in GPT-2 small
Kevin Wang
Alexandre Variengien
Arthur Conmy
Buck Shlegeris
Jacob Steinhardt
210
491
0
01 Nov 2022
1