ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2104.07412
  4. Cited By
XTREME-R: Towards More Challenging and Nuanced Multilingual Evaluation
v1v2 (latest)

XTREME-R: Towards More Challenging and Nuanced Multilingual Evaluation

Conference on Empirical Methods in Natural Language Processing (EMNLP), 2021
15 April 2021
Sebastian Ruder
Noah Constant
Jan A. Botha
Aditya Siddhant
Orhan Firat
Jinlan Fu
Pengfei Liu
Junjie Hu
Dan Garrette
Graham Neubig
Melvin Johnson
    ELMAAMLLRM
ArXiv (abs)PDFHTMLGithub (644★)

Papers citing "XTREME-R: Towards More Challenging and Nuanced Multilingual Evaluation"

50 / 147 papers shown
Title
Rethinking what Matters: Effective and Robust Multilingual Realignment for Low-Resource Languages
Rethinking what Matters: Effective and Robust Multilingual Realignment for Low-Resource Languages
Quang Phuoc Nguyen
David Anugraha
Felix Gaschi
Jun Bin Cheng
En-Shiun Annie Lee
136
0
0
09 Nov 2025
TransAlign: Machine Translation Encoders are Strong Word Aligners, Too
TransAlign: Machine Translation Encoders are Strong Word Aligners, TooConference on Empirical Methods in Natural Language Processing (EMNLP), 2025
Benedikt Ebing
Christian Goldschmied
Goran Glavaš
76
0
0
31 Oct 2025
Modality Matching Matters: Calibrating Language Distances for Cross-Lingual Transfer in URIEL+
Modality Matching Matters: Calibrating Language Distances for Cross-Lingual Transfer in URIEL+
York Hay Ng
Aditya Khan
Xiang Lu
Matteo Salloum
Michael Zhou
Phuong H. Hoang
A. Seza Doğruöz
En-Shiun Annie Lee
104
1
0
22 Oct 2025
Model-Based Ranking of Source Languages for Zero-Shot Cross-Lingual Transfer
Model-Based Ranking of Source Languages for Zero-Shot Cross-Lingual Transfer
Abteen Ebrahimi
Adam Wiemerslage
Katharina von der Wense
LRM
131
0
0
03 Oct 2025
MENLO: From Preferences to Proficiency -- Evaluating and Modeling Native-like Quality Across 47 Languages
MENLO: From Preferences to Proficiency -- Evaluating and Modeling Native-like Quality Across 47 Languages
Chenxi Whitehouse
Sebastian Ruder
Tony Lin
Oksana Kurylo
Haruka Takagi
Janice Lam
Nicolò Busetto
Denise Diaz
Francisco Guzmán
92
0
0
30 Sep 2025
Evaluating Language Translation Models by Playing Telephone
Evaluating Language Translation Models by Playing Telephone
Syeda Jannatus Saba
Steven Skiena
72
0
0
23 Sep 2025
SinhalaMMLU: A Comprehensive Benchmark for Evaluating Multitask Language Understanding in Sinhala
SinhalaMMLU: A Comprehensive Benchmark for Evaluating Multitask Language Understanding in Sinhala
Ashmari Pramodya
Nirasha Nelki
Heshan Shalinda
Chamila Liyanage
Yusuke Sakai
Randil Pushpananda
Ruvan Weerasinghe
Hidetaka Kamigaito
Taro Watanabe
LRM
87
0
0
03 Sep 2025
Quantifying Language Disparities in Multilingual Large Language Models
Quantifying Language Disparities in Multilingual Large Language Models
Songbo Hu
Ivan Vulić
Anna Korhonen
100
2
0
23 Aug 2025
Cetvel: A Unified Benchmark for Evaluating Language Understanding, Generation and Cultural Capacity of LLMs for Turkish
Cetvel: A Unified Benchmark for Evaluating Language Understanding, Generation and Cultural Capacity of LLMs for Turkish
Yakup Abrek Er
.Ilker Kesen
Gözde Gül Şahin
Aykut Erdem
ELMVLM
127
0
0
22 Aug 2025
Survey of NLU Benchmarks Diagnosing Linguistic Phenomena: Why not Standardize Diagnostics Benchmarks?
Survey of NLU Benchmarks Diagnosing Linguistic Phenomena: Why not Standardize Diagnostics Benchmarks?
Khloud Al Jallad
Nada Ghneim
Ghaida Rebdawi
LM&MAELM
164
0
0
27 Jul 2025
IndicRAGSuite: Large-Scale Datasets and a Benchmark for Indian Language RAG Systems
IndicRAGSuite: Large-Scale Datasets and a Benchmark for Indian Language RAG Systems
Pasunuti Prasanjith
Prathmesh B More
Anoop Kunchukuttan
Mary Dabre
RALM
230
0
0
02 Jun 2025
Moderating Harm: Benchmarking Large Language Models for Cyberbullying Detection in YouTube Comments
Moderating Harm: Benchmarking Large Language Models for Cyberbullying Detection in YouTube CommentsInternational Journal of Computer Applications (IJCA), 2025
Amel Muminovic
ELMAI4MH
179
0
0
25 May 2025
The Devil Is in the Word Alignment Details: On Translation-Based Cross-Lingual Transfer for Token Classification Tasks
The Devil Is in the Word Alignment Details: On Translation-Based Cross-Lingual Transfer for Token Classification TasksAnnual Meeting of the Association for Computational Linguistics (ACL), 2025
Benedikt Ebing
Goran Glavaš
262
1
0
15 May 2025
Myanmar XNLI: Building a Dataset and Exploring Low-resource Approaches to Natural Language Inference with Myanmar
Myanmar XNLI: Building a Dataset and Exploring Low-resource Approaches to Natural Language Inference with MyanmarLanguage Resources and Evaluation (LRE), 2025
Aung Kyaw Htet
Mark Dras
114
4
0
13 Apr 2025
LAG-MMLU: Benchmarking Frontier LLM Understanding in Latvian and Giriama
LAG-MMLU: Benchmarking Frontier LLM Understanding in Latvian and Giriama
Naome A. Etori
Kevin Lu
Randu Karisa
Arturs Kanepajs
LRMELM
931
1
0
14 Mar 2025
MMLU-ProX: A Multilingual Benchmark for Advanced Large Language Model Evaluation
MMLU-ProX: A Multilingual Benchmark for Advanced Large Language Model Evaluation
Weihao Xuan
Rui Yang
Heli Qi
Qingcheng Zeng
Yunze Xiao
...
Edison Marrese-Taylor
Shijian Lu
Yusuke Iwasawa
Yutaka Matsuo
Irene Li
ELM
452
27
0
13 Mar 2025
NusaAksara: A Multimodal and Multilingual Benchmark for Preserving Indonesian Indigenous Scripts
NusaAksara: A Multimodal and Multilingual Benchmark for Preserving Indonesian Indigenous ScriptsAnnual Meeting of the Association for Computational Linguistics (ACL), 2025
Muhammad Farid Adilazuarda
M. Wijanarko
Lucky Susanto
Khumaisa Nuráini
Derry Wijaya
Alham Fikri Aji
296
2
0
25 Feb 2025
URIEL+: Enhancing Linguistic Inclusion and Usability in a Typological and Multilingual Knowledge Base
URIEL+: Enhancing Linguistic Inclusion and Usability in a Typological and Multilingual Knowledge BaseInternational Conference on Computational Linguistics (COLING), 2024
Aditya Khan
Mason Shipton
David Anugraha
Kaiyao Duan
Phuong H. Hoang
Eric Khiu
A. Seza Doğruöz
En-Shiun Annie Lee
VLM
317
13
0
17 Feb 2025
INCLUDE: Evaluating Multilingual Language Understanding with Regional
  Knowledge
INCLUDE: Evaluating Multilingual Language Understanding with Regional KnowledgeInternational Conference on Learning Representations (ICLR), 2024
Angelika Romanou
Negar Foroutan
Anna Sotnikova
Zeming Chen
Sree Harsha Nelaturu
...
Mike Zhang
Imanol Schlag
Marzieh Fadaee
Sara Hooker
Antoine Bosselut
ELM
336
28
0
29 Nov 2024
DiffSLT: Enhancing Diversity in Sign Language Translation via Diffusion
  Model
DiffSLT: Enhancing Diversity in Sign Language Translation via Diffusion ModelPattern Recognition Letters (PR), 2024
JiHwan Moon
Jihoon Park
Jungeun Kim
Jongseong Bae
Hyeongwoo Jeon
Ha Young Kim
252
2
0
26 Nov 2024
Cross-lingual Back-Parsing: Utterance Synthesis from Meaning
  Representation for Zero-Resource Semantic Parsing
Cross-lingual Back-Parsing: Utterance Synthesis from Meaning Representation for Zero-Resource Semantic ParsingConference on Empirical Methods in Natural Language Processing (EMNLP), 2024
Deokhyung Kang
Seonjeong Hwang
Yunsu Kim
Gary Geunbae Lee
257
0
0
01 Oct 2024
XTRUST: On the Multilingual Trustworthiness of Large Language Models
XTRUST: On the Multilingual Trustworthiness of Large Language Models
Yahan Li
Yi Wang
Yi-Ju Chang
Yuan Wu
LRMHILM
156
2
0
24 Sep 2024
Bilingual Evaluation of Language Models on General Knowledge in University Entrance Exams with Minimal Contamination
Bilingual Evaluation of Language Models on General Knowledge in University Entrance Exams with Minimal ContaminationInternational Conference on Computational Linguistics (COLING), 2024
Eva Sánchez Salido
Roser Morante
Julio Gonzalo
Guillermo Marco
Jorge Carrillo-de-Albornoz
...
Enrique Amigó
Andrés Fernández
Alejandro Benito-Santos
Adrián Ghajari Espinosa
Victor Fresno
ELM
246
0
0
19 Sep 2024
AraDiCE: Benchmarks for Dialectal and Cultural Capabilities in LLMs
AraDiCE: Benchmarks for Dialectal and Cultural Capabilities in LLMs
Basel Mousi
Nadir Durrani
Fatema Ahmad
Md. Arid Hasan
Maram Hasanain
Tameem Kabbani
Fahim Dalvi
Shammur A. Chowdhury
Firoj Alam
243
35
0
17 Sep 2024
Do Large Language Models Speak All Languages Equally? A Comparative
  Study in Low-Resource Settings
Do Large Language Models Speak All Languages Equally? A Comparative Study in Low-Resource Settings
Md. Arid Hasan
Prerona Tarannum
Krishno Dey
Imran Razzak
Usman Naseem
206
10
0
05 Aug 2024
mGTE: Generalized Long-Context Text Representation and Reranking Models
  for Multilingual Text Retrieval
mGTE: Generalized Long-Context Text Representation and Reranking Models for Multilingual Text RetrievalConference on Empirical Methods in Natural Language Processing (EMNLP), 2024
Xin Zhang
Yanzhao Zhang
Dingkun Long
Wen Xie
Ziqi Dai
...
Pengjun Xie
Fei Huang
Meishan Zhang
Wenjie Li
Min Zhang
274
216
0
29 Jul 2024
sPhinX: Sample Efficient Multilingual Instruction Fine-Tuning Through N-shot Guided Prompting
sPhinX: Sample Efficient Multilingual Instruction Fine-Tuning Through N-shot Guided Prompting
Sanchit Ahuja
Kumar Tanmay
Hardik Hansrajbhai Chauhan
Barun Patra
Kriti Aggarwal
...
Tejas I. Dhamecha
Ahmed Awadallah
Monojit Choudhary
Vishrav Chaudhary
Sunayana Sitaram
362
4
0
13 Jul 2024
Faux Polyglot: A Study on Information Disparity in Multilingual Large Language Models
Faux Polyglot: A Study on Information Disparity in Multilingual Large Language Models
Nikhil Sharma
Kenton Murray
Ziang Xiao
411
5
0
07 Jul 2024
Disce aut Deficere: Evaluating LLMs Proficiency on the INVALSI Italian
  Benchmark
Disce aut Deficere: Evaluating LLMs Proficiency on the INVALSI Italian Benchmark
Fabio Mercorio
Mario Mezzanzanica
Daniele Potertì
Antonio Serino
Andrea Seveso
217
9
0
25 Jun 2024
PARIKSHA : A Large-Scale Investigation of Human-LLM Evaluator Agreement
  on Multilingual and Multi-Cultural Data
PARIKSHA : A Large-Scale Investigation of Human-LLM Evaluator Agreement on Multilingual and Multi-Cultural Data
Ishaan Watts
Varun Gumma
Aditya Yadavalli
Vivek Seshadri
Manohar Swaminathan
Sunayana Sitaram
ELM
223
23
0
21 Jun 2024
On the Evaluation Practices in Multilingual NLP: Can Machine Translation
  Offer an Alternative to Human Translations?
On the Evaluation Practices in Multilingual NLP: Can Machine Translation Offer an Alternative to Human Translations?
Rochelle Choenni
Sara Rajaee
Christof Monz
Ekaterina Shutova
275
5
0
20 Jun 2024
Self-Distillation for Model Stacking Unlocks Cross-Lingual NLU in 200+
  Languages
Self-Distillation for Model Stacking Unlocks Cross-Lingual NLU in 200+ LanguagesConference on Empirical Methods in Natural Language Processing (EMNLP), 2024
Fabian David Schmidt
Philipp Borchert
Ivan Vulić
Goran Glavaš
194
8
0
18 Jun 2024
Decoding the Diversity: A Review of the Indic AI Research Landscape
Decoding the Diversity: A Review of the Indic AI Research Landscape
Sankalp KJ
Vinija Jain
S. Bhaduri
Tamoghna Roy
Vasu Sharma
191
7
0
13 Jun 2024
MINERS: Multilingual Language Models as Semantic Retrievers
MINERS: Multilingual Language Models as Semantic Retrievers
Genta Indra Winata
Ruochen Zhang
David Ifeoluwa Adelani
RALM
372
12
0
11 Jun 2024
From Form(s) to Meaning: Probing the Semantic Depths of Language Models
  Using Multisense Consistency
From Form(s) to Meaning: Probing the Semantic Depths of Language Models Using Multisense Consistency
Xenia Ohmer
Elia Bruni
Dieuwke Hupkes
AI4CE
249
9
0
18 Apr 2024
Multilingual Large Language Model: A Survey of Resources, Taxonomy and
  Frontiers
Multilingual Large Language Model: A Survey of Resources, Taxonomy and Frontiers
Libo Qin
Qiguang Chen
Yuhang Zhou
Zhi Chen
Hai-Tao Zheng
Lizi Liao
Min Li
Wanxiang Che
Philip S. Yu
LRM
331
53
0
07 Apr 2024
DIALECTBENCH: A NLP Benchmark for Dialects, Varieties, and
  Closely-Related Languages
DIALECTBENCH: A NLP Benchmark for Dialects, Varieties, and Closely-Related Languages
Fahim Faisal
Orevaoghene Ahia
Aarohi Srivastava
Kabir Ahuja
David Chiang
Yulia Tsvetkov
Antonios Anastasopoulos
194
45
0
16 Mar 2024
Cost-Performance Optimization for Processing Low-Resource Language Tasks
  Using Commercial LLMs
Cost-Performance Optimization for Processing Low-Resource Language Tasks Using Commercial LLMsConference on Empirical Methods in Natural Language Processing (EMNLP), 2024
Arijit Nag
Animesh Mukherjee
Niloy Ganguly
Soumen Chakrabarti
195
8
0
08 Mar 2024
Evaluating the Elementary Multilingual Capabilities of Large Language
  Models with MultiQ
Evaluating the Elementary Multilingual Capabilities of Large Language Models with MultiQ
Carolin Holtermann
Paul Röttger
Timm Dill
Anne Lauscher
ELMLRM
235
33
0
06 Mar 2024
Could We Have Had Better Multilingual LLMs If English Was Not the
  Central Language?
Could We Have Had Better Multilingual LLMs If English Was Not the Central Language?
Ryandito Diandaru
Lucky Susanto
Zilu Tang
Ayu Purwarianti
Derry Wijaya
243
3
0
21 Feb 2024
ArabicMMLU: Assessing Massive Multitask Language Understanding in Arabic
ArabicMMLU: Assessing Massive Multitask Language Understanding in Arabic
Fajri Koto
Jinyan Su
Sara Shatnawi
Jad Doughman
Abdelrahman Boda Sadallah
...
Neha Sengupta
Shady Shehata
Farah E. Shamout
Preslav Nakov
Timothy Baldwin
ELMLRM
285
71
0
20 Feb 2024
Aya Dataset: An Open-Access Collection for Multilingual Instruction
  Tuning
Aya Dataset: An Open-Access Collection for Multilingual Instruction TuningAnnual Meeting of the Association for Computational Linguistics (ACL), 2024
Shivalika Singh
Freddie Vargus
Daniel D'souza
Börje F. Karlsson
Abinaya Mahendiran
...
Max Bartolo
Julia Kreutzer
Ahmet Üstün
Marzieh Fadaee
Sara Hooker
339
168
0
09 Feb 2024
What is "Typological Diversity" in NLP?
What is "Typological Diversity" in NLP?
Esther Ploeger
Wessel Poelman
Miryam de Lhoneux
Johannes Bjerva
361
4
0
06 Feb 2024
Analyzing the Evaluation of Cross-Lingual Knowledge Transfer in
  Multilingual Language Models
Analyzing the Evaluation of Cross-Lingual Knowledge Transfer in Multilingual Language Models
Sara Rajaee
Christof Monz
222
10
0
03 Feb 2024
Translation Errors Significantly Impact Low-Resource Languages in
  Cross-Lingual Learning
Translation Errors Significantly Impact Low-Resource Languages in Cross-Lingual Learning
Ashish Agrawal
Barah Fazili
Preethi Jyothi
219
7
0
03 Feb 2024
ToPro: Token-Level Prompt Decomposition for Cross-Lingual Sequence
  Labeling Tasks
ToPro: Token-Level Prompt Decomposition for Cross-Lingual Sequence Labeling Tasks
Bolei Ma
Ercong Nie
Shuzhou Yuan
Helmut Schmid
Michael Farber
Frauke Kreuter
Hinrich Schütze
VLM
289
9
0
29 Jan 2024
Discovering Low-rank Subspaces for Language-agnostic Multilingual
  Representations
Discovering Low-rank Subspaces for Language-agnostic Multilingual RepresentationsConference on Empirical Methods in Natural Language Processing (EMNLP), 2024
Zhihui Xie
Handong Zhao
Tong Yu
Shuai Li
159
16
0
11 Jan 2024
Understanding LLMs: A Comprehensive Overview from Training to Inference
Understanding LLMs: A Comprehensive Overview from Training to Inference
Yi-Hsueh Liu
Haoyang He
Tianle Han
Xu-Yao Zhang
Mengyuan Liu
...
Xiaoyan Cai
Tuo Zhang
Ning Qiang
Tianming Liu
Bao Ge
SyDa
402
120
0
04 Jan 2024
To Translate or Not to Translate: A Systematic Investigation of
  Translation-Based Cross-Lingual Transfer to Low-Resource Languages
To Translate or Not to Translate: A Systematic Investigation of Translation-Based Cross-Lingual Transfer to Low-Resource LanguagesNorth American Chapter of the Association for Computational Linguistics (NAACL), 2023
Benedikt Ebing
Goran Glavaš
218
6
0
15 Nov 2023
PLUG: Leveraging Pivot Language in Cross-Lingual Instruction Tuning
PLUG: Leveraging Pivot Language in Cross-Lingual Instruction TuningAnnual Meeting of the Association for Computational Linguistics (ACL), 2023
Zhihan Zhang
Dong-Ho Lee
Yuwei Fang
Wenhao Yu
Mengzhao Jia
Meng Jiang
Francesco Barbieri
ALM
263
41
0
15 Nov 2023
123
Next