ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2504.05747
26
11

SEA-LION: Southeast Asian Languages in One Network

8 April 2025
Raymond Ng
Thanh Ngan Nguyen
Yuli Huang
Ngee Chia Tai
Wai Yi Leong
Wei Qi Leong
Xianbin Yong
Jian Gang Ngui
Yosephine Susanto
Nicholas Cheng
Hamsawardhini Rengarajan
Peerat Limkonchotiwat
Adithya Venkatadri Hulagadri
Kok Wai Teng
Yeo Yeow Tong
Bryan Siow
Wei Yi Teo
Wayne Lau
Choon Meng Tan
Brandon Ong
Zhi Hao Ong
Jann Railey Montalan
Adwin Chan
Sajeban Antonyrex
Ren Lee
Esther Choa
David Ong Tat-Wee
B. Liu
William-Chandra Tjhi
Erik Cambria
Leslie Teo
ArXivPDFHTML
Abstract

Recently, Large Language Models (LLMs) have dominated much of the artificial intelligence scene with their ability to process and generate natural languages. However, the majority of LLM research and development remains English-centric, leaving low-resource languages such as those in the Southeast Asian (SEA) region under-represented. To address this representation gap, we introduce Llama-SEA-LION-v3-8B-IT and Gemma-SEA-LION-v3-9B-IT, two cutting-edge multilingual LLMs designed for SEA languages. The SEA-LION family of LLMs supports 11 SEA languages, namely English, Chinese, Indonesian, Vietnamese, Malay, Thai, Burmese, Lao, Filipino, Tamil, and Khmer. Our work leverages large-scale multilingual continued pre-training with a comprehensive post-training regime involving multiple stages of instruction fine-tuning, alignment, and model merging. Evaluation results on multilingual benchmarks indicate that our models achieve state-of-the-art performance across LLMs supporting SEA languages. We open-source the models to benefit the wider SEA community.

View on arXiv
@article{ng2025_2504.05747,
  title={ SEA-LION: Southeast Asian Languages in One Network },
  author={ Raymond Ng and Thanh Ngan Nguyen and Yuli Huang and Ngee Chia Tai and Wai Yi Leong and Wei Qi Leong and Xianbin Yong and Jian Gang Ngui and Yosephine Susanto and Nicholas Cheng and Hamsawardhini Rengarajan and Peerat Limkonchotiwat and Adithya Venkatadri Hulagadri and Kok Wai Teng and Yeo Yeow Tong and Bryan Siow and Wei Yi Teo and Wayne Lau and Choon Meng Tan and Brandon Ong and Zhi Hao Ong and Jann Railey Montalan and Adwin Chan and Sajeban Antonyrex and Ren Lee and Esther Choa and David Ong Tat-Wee and Bing Jie Darius Liu and William Chandra Tjhi and Erik Cambria and Leslie Teo },
  journal={arXiv preprint arXiv:2504.05747},
  year={ 2025 }
}
Comments on this paper