120

Emergence of Quantised Representations Isolated to Anisotropic Functions

Main:13 Pages
40 Figures
Bibliography:2 Pages
Appendix:26 Pages
Abstract

This paper describes a novel methodology for determining representational alignment, developed upon the existing Spotlight Resonance method. Using this, it is found that algebraic symmetries of network primitives are a strong predictor for task-agnostic structure in representations. Particularly, this new tool is used to gain insight into how discrete representations can form and arrange in autoencoder models, through an ablation study where only the activation function is altered. Representations are found to tend to discretise when the activation functions are defined through a discrete algebraic permutation-equivariant symmetry. In contrast, they remain continuous under a continuous algebraic orthogonal-equivariant definition. These findings corroborate the hypothesis that functional form choices can carry unintended inductive biases which produce task-independent artefactual structures in representations, particularly that contemporary forms induce discretisation of otherwise continuous structure -- a quantisation effect. Moreover, this supports a general causal model for one mode in which discrete representations may form, and could constitute a prerequisite for downstream interpretability phenomena, including grandmother neurons, discrete coding schemes, general linear features and possibly Superposition. Hence, this tool and proposed mechanism for the influence of functional form on representations may provide several insights into emergent interpretability research. Finally, preliminary results indicate that quantisation of representations appears to correlate with a measurable increase in reconstruction error, reinforcing previous conjectures that this collapse can be detrimental.

View on arXiv
Comments on this paper