ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2502.04688
95
0

M-IFEval: Multilingual Instruction-Following Evaluation

7 February 2025
Antoine Dussolle
Andrea Cardeña Díaz
Shota Sato
Peter Devine
    ELM
ArXivPDFHTML
Abstract

Instruction following is a core capability of modern Large language models (LLMs), making evaluating this capability essential to understanding these models. The Instruction Following Evaluation (IFEval) benchmark from the literature does this using objective criteria, offering a measure of LLM performance without subjective AI or human judgement. However, it only includes English instructions, limiting its ability to assess LLMs in other languages.We propose the Multilingual Instruction Following Evaluation (M-IFEval) benchmark, expanding the evaluation to French, Japanese, and Spanish, with both general and language-specific instructions. Applying this benchmark to 8 state-of-the-art LLMs, we find that benchmark performance across languages and instruction types can vary widely, underscoring the importance of a multilingual benchmark for evaluating LLMs in a diverse cultural context.

View on arXiv
@article{dussolle2025_2502.04688,
  title={ M-IFEval: Multilingual Instruction-Following Evaluation },
  author={ Antoine Dussolle and Andrea Cardeña Díaz and Shota Sato and Peter Devine },
  journal={arXiv preprint arXiv:2502.04688},
  year={ 2025 }
}
Comments on this paper