ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2407.13739
19
4

Scaling Granite Code Models to 128K Context

18 July 2024
Matt Stallone
Vaibhav Saxena
Leonid Karlinsky
Bridget McGinn
Tim Bula
Mayank Mishra
Adriana Meza Soria
Gaoyuan Zhang
Aditya Prasad
Yikang Shen
Saptha Surendran
Shanmukha C. Guttula
Hima Patel
Parameswaran Selvam
Xuan-Hong Dang
Yan Koyfman
Atin Sood
Rogerio Feris
Nirmit Desai
David D. Cox
Ruchir Puri
Rameswar Panda
ArXivPDFHTML
Abstract

This paper introduces long-context Granite code models that support effective context windows of up to 128K tokens. Our solution for scaling context length of Granite 3B/8B code models from 2K/4K to 128K consists of a light-weight continual pretraining by gradually increasing its RoPE base frequency with repository-level file packing and length-upsampled long-context data. Additionally, we also release instruction-tuned models with long-context support which are derived by further finetuning the long context base models on a mix of permissively licensed short and long-context instruction-response pairs. While comparing to the original short-context Granite code models, our long-context models achieve significant improvements on long-context tasks without any noticeable performance degradation on regular code completion benchmarks (e.g., HumanEval). We release all our long-context Granite code models under an Apache 2.0 license for both research and commercial use.

View on arXiv
Comments on this paper