Growing Transformers: Modular Composition and Layer-wise Expansion on a Frozen Substrate

8 July 2025

A. Bochkov

MoE

ArXiv (abs)PDF HTML HuggingFace (2 upvotes)Github (1★)

Main:10 Pages

11 Figures

Bibliography:2 Pages

5 Tables

Appendix:3 Pages

Abstract

The prevailing paradigm for scaling large language models (LLMs) involves monolithic, end-to-end training, a resource-intensive process that lacks flexibility. This paper explores an alternative, constructive approach to model development, built upon the foundation of non-trainable, deterministic input embeddings. In prior [1], we established that high-level semantic reasoning can emerge in Transformers using frozen embeddings derived from the visual structure of Unicode glyphs. Here, we demonstrate that this fixed representational substrate acts as a universal "docking port," enabling two powerful and efficient scaling paradigms: seamless modular composition and progressive layer-wise growth.

View on arXiv

Comments on this paper