Growing Transformers: Modular Composition and Layer-wise Expansion on a Frozen Substrate
- MoE
Main:10 Pages
11 Figures
Bibliography:2 Pages
5 Tables
Appendix:3 Pages
Abstract
The prevailing paradigm for scaling large language models (LLMs) involves monolithic, end-to-end training, a resource-intensive process that lacks flexibility. This paper explores an alternative, constructive approach to model development, built upon the foundation of non-trainable, deterministic input embeddings. In prior [1], we established that high-level semantic reasoning can emerge in Transformers using frozen embeddings derived from the visual structure of Unicode glyphs. Here, we demonstrate that this fixed representational substrate acts as a universal "docking port," enabling two powerful and efficient scaling paradigms: seamless modular composition and progressive layer-wise growth.
View on arXivComments on this paper
