ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2302.01973
10
19

Measuring The Impact Of Programming Language Distribution

3 February 2023
Gabriel Orlanski
Kefan Xiao
Xavier Garcia
Jeffrey Hui
Joshua Howland
J. Malmaud
Jacob Austin
Rishah Singh
Michele Catasta
ArXivPDFHTML
Abstract

Current benchmarks for evaluating neural code models focus on only a small subset of programming languages, excluding many popular languages such as Go or Rust. To ameliorate this issue, we present the BabelCode framework for execution-based evaluation of any benchmark in any language. BabelCode enables new investigations into the qualitative performance of models' memory, runtime, and individual test case results. Additionally, we present a new code translation dataset called Translating Python Programming Puzzles (TP3) from the Python Programming Puzzles (Schuster et al. 2021) benchmark that involves translating expert-level python functions to any language. With both BabelCode and the TP3 benchmark, we investigate if balancing the distributions of 14 languages in a training dataset improves a large language model's performance on low-resource languages. Training a model on a balanced corpus results in, on average, 12.34% higher pass@kpass@kpass@k across all tasks and languages compared to the baseline. We find that this strategy achieves 66.48% better pass@kpass@kpass@k on low-resource languages at the cost of only a 12.94% decrease to high-resource languages. In our three translation tasks, this strategy yields, on average, 30.77% better low-resource pass@kpass@kpass@k while having 19.58% worse high-resource pass@kpass@kpass@k.

View on arXiv
Comments on this paper