JABER and SABER: Junior and Senior Arabic BERt
Abbas Ghaddar
Yimeng Wu
Ahmad Rashid
Khalil Bibi
Mehdi Rezagholizadeh
Chao Xing
Yasheng Wang
Duan Xinyu
Zhefeng Wang
Baoxing Huai
Xin Jiang
Qun Liu
Philippe Langlais

Abstract
Language-specific pre-trained models have proven to be more accurate than multilingual ones in a monolingual evaluation setting, Arabic is no exception. However, we found that previously released Arabic BERT models were significantly under-trained. In this technical report, we present JABER and SABER, Junior and Senior Arabic BERt respectively, our pre-trained language model prototypes dedicated for Arabic. We conduct an empirical study to systematically evaluate the performance of models across a diverse set of existing Arabic NLU tasks. Experimental results show that JABER and SABER achieve state-of-the-art performances on ALUE, a new benchmark for Arabic Language Understanding Evaluation, as well as on a well-established NER benchmark.
View on arXivComments on this paper