Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2505.14424
Cited By
Explaining Neural Networks with Reasons
20 May 2025
Levin Hornischer
Hannes Leitgeb
FAtt
AAML
MILM
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Explaining Neural Networks with Reasons"
3 / 3 papers shown
Title
Sparse Autoencoders Can Interpret Randomly Initialized Transformers
Thomas Heap
Tim Lawson
Lucy Farnik
Laurence Aitchison
81
17
0
29 Jan 2025
A Practical Review of Mechanistic Interpretability for Transformer-Based Language Models
Daking Rai
Yilun Zhou
Shi Feng
Abulhair Saparov
Ziyu Yao
190
33
0
02 Jul 2024
Standards for Belief Representations in LLMs
Daniel A. Herrmann
B. Levinstein
99
11
0
31 May 2024
1