ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2504.05759
23
0

RETROcode: Leveraging a Code Database for Improved Natural Language to Code Generation

8 April 2025
Nathanael Beau
Benoît Crabbé
ArXivPDFHTML
Abstract

As text and code resources have expanded, large-scale pre-trained models have shown promising capabilities in code generation tasks, typically employing supervised fine-tuning with problem statement-program pairs. However, increasing model size and data volume for performance gains also raises computational demands and risks of overfitting. Addressing these challenges, we present RETROcode, a novel adaptation of the RETRO architecture \cite{RETRO} for sequence-to-sequence models, utilizing a large code database as an auxiliary scaling method. This approach, diverging from simply enlarging model and dataset sizes, allows RETROcode to leverage a vast code database for prediction, enhancing the model's efficiency by integrating extensive memory. Our findings indicate that RETROcode not only outperforms similar-sized traditional architectures on test sets but also approaches the effectiveness of the much larger Codex model, despite being trained from scratch on a substantially smaller dataset.

View on arXiv
@article{beau2025_2504.05759,
  title={ RETROcode: Leveraging a Code Database for Improved Natural Language to Code Generation },
  author={ Nathanaël Beau and Benoît Crabbé },
  journal={arXiv preprint arXiv:2504.05759},
  year={ 2025 }
}
Comments on this paper