What Causes Polysemanticity? An Alternative Origin Story of Mixed
Selectivity from Incidental Causes

v1v2v3 (latest)

What Causes Polysemanticity? An Alternative Origin Story of Mixed Selectivity from Incidental Causes

5 December 2023

Rylan Schaeffer

Naomi Bashkansky

ArXiv (abs)PDF HTML

Papers citing "What Causes Polysemanticity? An Alternative Origin Story of Mixed Selectivity from Incidental Causes"

13 / 13 papers shown

Title
Adversarial Attacks Leverage Interference Between Features in Superposition Edward Stevinson Lucas Prieto Melih Barsbey Tolga Birdal AAML 84 0 0 13 Oct 2025
Expand Neurons, Not Parameters Linghao Kong Inimai Subramanian Yonadav Shavit Micah Adler Dan Alistarh Nir Shavit 84 0 0 06 Oct 2025
Negative Pre-activations Differentiate Syntax Linghao Kong Angelina Ning Micah Adler Nir Shavit 84 0 0 29 Sep 2025
Interpreting ResNet-based CLIP via Neuron-Attention Decomposition Edmund Bu Yossi Gandelsman 177 0 0 24 Sep 2025
Evaluating Sparse Autoencoders for Monosemantic Representation Moghis Fereidouni Muhammad Umair Haider Peizhong Ju A.B. Siddique 132 0 0 20 Aug 2025
On the Theoretical Understanding of Identifiable Sparse Autoencoders and Beyond Jingyi Cui Tao Gui Yifei Wang Yisen Wang 124 2 0 19 Jun 2025
A Closer Look at Multimodal Representation Collapse Abhra Chaudhuri Anjan Dutta Tu Bui Serban Georgescu 217 6 0 28 May 2025
On the creation of narrow AI: hierarchy and nonlocality of neural network skills Eric J. Michaud Asher Parker-Sartori Max Tegmark 366 2 0 21 May 2025
Signal in the Noise: Polysemantic Interference Transfers and Predicts Cross-Model Influence Bofan Gong Shiyang Lai James A. Evans Dawn Song AAML MILM 209 1 0 16 May 2025
Towards Combinatorial Interpretability of Neural Computation Micah Adler Dan Alistarh Nir Shavit FAtt 712 7 0 10 Apr 2025
Make Haste Slowly: A Theory of Emergent Structured Mixed Selectivity in Feature Learning ReLU NetworksInternational Conference on Learning Representations (ICLR), 2025 Devon Jarvis Richard Klein Benjamin Rosman Andrew M. Saxe MLT 324 2 0 08 Mar 2025
Beyond Interpretability: The Gains of Feature Monosemanticity on Model RobustnessInternational Conference on Learning Representations (ICLR), 2024 Qi Zhang Yifei Wang Jingyi Cui Xiang Pan Qi Lei Stefanie Jegelka Yisen Wang AAML 262 4 0 27 Oct 2024
Towards Uncovering How Large Language Model Works: An Explainability Perspective Haiyan Zhao Fan Yang Bo Shen Himabindu Lakkaraju Jundong Li 278 23 0 16 Feb 2024