TigerLLM -- A Family of Bangla Large Language Models

14 March 2025

Abstract

The development of Large Language Models (LLMs) remains heavily skewed towards English and a few other high-resource languages. This linguistic disparity is particularly evident for Bangla - the 5th most spoken language. A few initiatives attempted to create open-source Bangla LLMs with performance still behind high-resource languages and limited reproducibility. To address this gap, we introduce TigerLLM - a family of Bangla LLMs. Our results demonstrate that these models surpass all open-source alternatives and also outperform larger proprietary models like GPT3.5 across standard benchmarks, establishing TigerLLM as the new baseline for future Bangla language modeling.

View on arXiv

@article{raihan2025_2503.10995,
  title={ TigerLLM -- A Family of Bangla Large Language Models },
  author={ Nishat Raihan and Marcos Zampieri },
  journal={arXiv preprint arXiv:2503.10995},
  year={ 2025 }
}

Comments on this paper