ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2502.15090
28
1

Analyze the Neurons, not the Embeddings: Understanding When and Where LLM Representations Align with Humans

20 February 2025
Masha Fedzechkina
Eleonora Gualdoni
Sinead Williamson
Katherine Metcalf
Skyler Seto
B. Theobald
ArXivPDFHTML
Abstract

Modern large language models (LLMs) achieve impressive performance on some tasks, while exhibiting distinctly non-human-like behaviors on others. This raises the question of how well the LLM's learned representations align with human representations. In this work, we introduce a novel approach to the study of representation alignment: we adopt a method from research on activation steering to identify neurons responsible for specific concepts (e.g., 'cat') and then analyze the corresponding activation patterns. Our findings reveal that LLM representations closely align with human representations inferred from behavioral data. Notably, this alignment surpasses that of word embeddings, which have been center stage in prior work on human and model alignment. Additionally, our approach enables a more granular view of how LLMs represent concepts. Specifically, we show that LLMs organize concepts in a way that reflects hierarchical relationships interpretable to humans (e.g., ánimal'-'dog').

View on arXiv
@article{fedzechkina2025_2502.15090,
  title={ Analyze the Neurons, not the Embeddings: Understanding When and Where LLM Representations Align with Humans },
  author={ Masha Fedzechkina and Eleonora Gualdoni and Sinead Williamson and Katherine Metcalf and Skyler Seto and Barry-John Theobald },
  journal={arXiv preprint arXiv:2502.15090},
  year={ 2025 }
}
Comments on this paper