ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2207.11280
19
74

PanGu-Coder: Program Synthesis with Function-Level Language Modeling

22 July 2022
Fenia Christopoulou
Gerasimos Lampouras
Milan Gritta
Guchun Zhang
Yinpeng Guo
Zhong-yi Li
Qi Zhang
M. Xiao
Bo Shen
Lin Li
Hao Yu
Li-yu Yan
Pingyi Zhou
Xin Wang
Yuchi Ma
Ignacio Iacobacci
Yasheng Wang
Guangtai Liang
Jia Wei
Xin Jiang
Qianxiang Wang
Qun Liu
    ELM
    SyDa
    ALM
ArXivPDFHTML
Abstract

We present PanGu-Coder, a pretrained decoder-only language model adopting the PanGu-Alpha architecture for text-to-code generation, i.e. the synthesis of programming language solutions given a natural language problem description. We train PanGu-Coder using a two-stage strategy: the first stage employs Causal Language Modelling (CLM) to pre-train on raw programming language data, while the second stage uses a combination of Causal Language Modelling and Masked Language Modelling (MLM) training objectives that focus on the downstream task of text-to-code generation and train on loosely curated pairs of natural language program definitions and code functions. Finally, we discuss PanGu-Coder-FT, which is fine-tuned on a combination of competitive programming problems and code with continuous integration tests. We evaluate PanGu-Coder with a focus on whether it generates functionally correct programs and demonstrate that it achieves equivalent or better performance than similarly sized models, such as CodeX, while attending a smaller context window and training on less data.

View on arXiv
Comments on this paper