Fine Tuning Methods for Low-resource Languages

5 October 2025

Tim Bakkenes

Daniel Wang

Anton Johansson

VLM

ArXiv (abs)PDF HTML Github

Main:18 Pages

11 Figures

Bibliography:2 Pages

4 Tables

Abstract

The rise of Large Language Models has not been inclusive of all cultures. The models are mostly trained on English texts and culture which makes them underperform in other languages and cultural contexts. By developing a generalizable method for preparing culturally relevant datasets and post-training the Gemma 2 model, this project aimed to increase the performance of Gemma 2 for an underrepresented language and showcase how others can do the same to unlock the power of Generative AI in their country and preserve their cultural heritage.

View on arXiv

Comments on this paper