Toward Mechanistic Explanation of Deductive Reasoning in Language Models

10 October 2025

Davide Maltoni

ArXiv (abs)PDF HTML Github (44984★)

Main:12 Pages

10 Figures

Bibliography:3 Pages

1 Tables

Abstract

Recent large language models have demonstrated relevant capabilities in solving problems that require logical reasoning; however, the corresponding internal mechanisms remain largely unexplored. In this paper, we show that a small language model can solve a deductive reasoning task by learning the underlying rules (rather than operating as a statistical learner). A low-level explanation of its internal representations and computational circuits is then provided. Our findings reveal that induction heads play a central role in the implementation of the rule completion and rule chaining steps involved in the logical inference required by the task.

View on arXiv

Comments on this paper