ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2409.17391
21
2

Scaling Behavior for Large Language Models regarding Numeral Systems: An Example using Pythia

25 September 2024
Zhejian Zhou
Jiayu Wang
Dahua Lin
Kai Chen
    LRM
ArXivPDFHTML
Abstract

Though Large Language Models (LLMs) have shown remarkable abilities in mathematics reasoning, they are still struggling with performing numeric operations accurately, such as addition and multiplication. Numbers can be tokenized into tokens in various ways by different LLMs and affect the numeric operations performance. Currently, there are two representatives: 1) Tokenize into 111-digit, and 2) Tokenize into 1∼31\sim 31∼3 digit. The difference is roughly equivalent to using different numeral systems (namely base 101010 or base 10310^{3}103). In light of this, we study the scaling behavior of different numeral systems in the context of transformer-based large language models. We empirically show that a base 101010 system is consistently more data-efficient than a base 10210^{2}102 or 10310^{3}103 system across training data scale, model sizes under from-scratch training settings, while different number systems have very similar fine-tuning performances. We attribute this to higher token frequencies of a base 101010 system. Additionally, we reveal extrapolation behavior patterns on addition and multiplication. We identify that base 100100100 and base 100010001000 systems struggle on token-level discernment and token-level operations. We also sheds light on the mechanism learnt by the models.

View on arXiv
Comments on this paper