ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2411.19799
78
5

INCLUDE: Evaluating Multilingual Language Understanding with Regional Knowledge

29 November 2024
Angelika Romanou
Negar Foroutan
Anna Sotnikova
Zeming Chen
Sree Harsha Nelaturu
S. Kamath S
Rishabh Maheshwary
Micol Altomare
Mohamed A. Haggag
S. Alizadeh
Alfonso Amayuelas
Azril Amirudin
Viraat Aryabumi
Danylo Boiko
Michael Chang
Jenny Chim
Gal Cohen
Aditya Kumar Dalmia
Abraham Diress
Sharad Duwal
Daniil Dzenhaliou
Daniel Fernando Erazo Florez
Fabian Farestam
Joseph Marvin Imperial
Shayekh Bin Islam
Perttu Isotalo
Maral Jabbarishiviari
Börje F. Karlsson
Eldar Khalilov
Christopher Klamm
Fajri Koto
Dominik Krzemiñski
Gabriel Adriano de Melo
Syrielle Montariol
Yiyang Nan
Joel Niklaus
Jekaterina Novikova
Johan Obando-Ceron
Debjit Paul
Esther Ploeger
Jebish Purbey
Swati Rajwal
Selvan Sunitha Ravi
Sara Rydell
Roshan Santhosh
Drishti Sharma
Marjana Prifti Skenduli
Arshia Soltani Moakhar
Bardia Soltani Moakhar
Ran Tamir
Ayush Kumar Tarun
Azmine Toushik Wasi
Thenuka Ovin Weerasinghe
Serhan Yilmaz
Mike Zhang
Imanol Schlag
Marzieh Fadaee
Sara Hooker
Antoine Bosselut
    ELM
ArXivPDFHTML
Abstract

The performance differential of large language models (LLM) between languages hinders their effective deployment in many regions, inhibiting the potential economic and societal value of generative AI tools in many communities. However, the development of functional LLMs in many languages (\ie, multilingual LLMs) is bottlenecked by the lack of high-quality evaluation resources in languages other than English. Moreover, current practices in multilingual benchmark construction often translate English resources, ignoring the regional and cultural knowledge of the environments in which multilingual systems would be used. In this work, we construct an evaluation suite of 197,243 QA pairs from local exam sources to measure the capabilities of multilingual LLMs in a variety of regional contexts. Our novel resource, INCLUDE, is a comprehensive knowledge- and reasoning-centric benchmark across 44 written languages that evaluates multilingual LLMs for performance in the actual language environments where they would be deployed.

View on arXiv
Comments on this paper