ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2505.23013
44
0

Scalable Complexity Control Facilitates Reasoning Ability of LLMs

29 May 2025
Liangkai Hang
Junjie Yao
Zhiwei Bai
Tianyi Chen
Yang Chen
R. Diao
Hezhou Li
Pengxiao Lin
Zhiwei Wang
Cheng Xu
Zhongwang Zhang
Zhangchen Zhou
Zhiyu Li
Zehao Lin
Kai Chen
Feiyu Xiong
Y. Zhang
Weinan E
Hongkang Yang
Zhi-hai Xu
    LRM
ArXiv (abs)PDFHTML
Main:9 Pages
17 Figures
Bibliography:6 Pages
9 Tables
Appendix:7 Pages
Abstract

The reasoning ability of large language models (LLMs) has been rapidly advancing in recent years, attracting interest in more fundamental approaches that can reliably enhance their generalizability. This work demonstrates that model complexity control, conveniently implementable by adjusting the initialization rate and weight decay coefficient, improves the scaling law of LLMs consistently over varying model sizes and data sizes. This gain is further illustrated by comparing the benchmark performance of 2.4B models pretrained on 1T tokens with different complexity hyperparameters. Instead of fixing the initialization std, we found that a constant initialization rate (the exponent of std) enables the scaling law to descend faster in both model and data sizes. These results indicate that complexity control is a promising direction for the continual advancement of LLMs.

View on arXiv
@article{hang2025_2505.23013,
  title={ Scalable Complexity Control Facilitates Reasoning Ability of LLMs },
  author={ Liangkai Hang and Junjie Yao and Zhiwei Bai and Tianyi Chen and Yang Chen and Rongjie Diao and Hezhou Li and Pengxiao Lin and Zhiwei Wang and Cheng Xu and Zhongwang Zhang and Zhangchen Zhou and Zhiyu Li and Zehao Lin and Kai Chen and Feiyu Xiong and Yaoyu Zhang and Weinan E and Hongkang Yang and Zhi-Qin John Xu },
  journal={arXiv preprint arXiv:2505.23013},
  year={ 2025 }
}
Comments on this paper