ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2211.13184
14
10

TorchScale: Transformers at Scale

23 November 2022
Shuming Ma
Hongyu Wang
Shaohan Huang
Wenhui Wang
Zewen Chi
Li Dong
Alon Benhaim
Barun Patra
Vishrav Chaudhary
Xia Song
Furu Wei
    AI4CE
ArXivPDFHTML
Abstract

Large Transformers have achieved state-of-the-art performance across many tasks. Most open-source libraries on scaling Transformers focus on improving training or inference with better parallelization. In this work, we present TorchScale, an open-source toolkit that allows researchers and developers to scale up Transformers efficiently and effectively. TorchScale has the implementation of several modeling techniques, which can improve modeling generality and capability, as well as training stability and efficiency. Experimental results on language modeling and neural machine translation demonstrate that TorchScale can successfully scale Transformers to different sizes without tears. The library is available at https://aka.ms/torchscale.

View on arXiv
Comments on this paper