52

Transformers know more than they can tell -- Learning the Collatz sequence

Main:23 Pages
9 Figures
Bibliography:5 Pages
7 Tables
Abstract

We investigate transformer prediction of long Collatz steps, a complex arithmetic function that maps odd integers to their distant successors in the Collatz sequence ( un+1=un/2u_{n+1}=u_n/2 if unu_n is even, un+1=(3un+1)/2u_{n+1}=(3u_n+1)/2 if unu_n is odd). Model accuracy varies with the base used to encode input and output. It can be as high as 99.7%99.7\% for bases 2424 and 3232, and as low as 3737 and 25%25\% for bases 1111 and 33. Yet, all models, no matter the base, follow a common learning pattern. As training proceeds, they learn a sequence of classes of inputs that share the same residual modulo 2p2^p. Models achieve near-perfect accuracy on these classes, and less than 1%1\% for all other inputs. This maps to a mathematical property of Collatz sequences: the length of the loops involved in the computation of a long Collatz step can be deduced from the binary representation of its input. The learning pattern reflects the model learning to predict inputs associated with increasing loop lengths. An analysis of failure cases reveals that almost all model errors follow predictable patterns. Hallucination, a common feature of large language models, almost never happens. In over 90%90\% of failures, the model performs the correct calculation, but wrongly estimates loop lengths. Our observations give a full account of the algorithms learned by the models. They suggest that the difficulty of learning such complex arithmetic function lies in figuring the control structure of the computation -- the length of the loops. We believe that the approach outlined here, using mathematical problems as tools for understanding, explaining, and perhaps improving language models, can be applied to a broad range of problems and bear fruitful results.

View on arXiv
Comments on this paper