ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2504.12939
45
0

Disentangling Polysemantic Channels in Convolutional Neural Networks

17 April 2025
Robin Hesse
Jonas Fischer
Simone Schaub-Meyer
Stefan Roth
    FAtt
    MILM
ArXivPDFHTML
Abstract

Mechanistic interpretability is concerned with analyzing individual components in a (convolutional) neural network (CNN) and how they form larger circuits representing decision mechanisms. These investigations are challenging since CNNs frequently learn polysemantic channels that encode distinct concepts, making them hard to interpret. To address this, we propose an algorithm to disentangle a specific kind of polysemantic channel into multiple channels, each responding to a single concept. Our approach restructures weights in a CNN, utilizing that different concepts within the same channel exhibit distinct activation patterns in the previous layer. By disentangling these polysemantic features, we enhance the interpretability of CNNs, ultimately improving explanatory techniques such as feature visualizations.

View on arXiv
@article{hesse2025_2504.12939,
  title={ Disentangling Polysemantic Channels in Convolutional Neural Networks },
  author={ Robin Hesse and Jonas Fischer and Simone Schaub-Meyer and Stefan Roth },
  journal={arXiv preprint arXiv:2504.12939},
  year={ 2025 }
}
Comments on this paper