v1v2 (latest)

Beyond Static Models and Test Sets: Benchmarking the Potential of Pre-trained Models Across Tasks and Languages

12 May 2022

Papers citing "Beyond Static Models and Test Sets: Benchmarking the Potential of Pre-trained Models Across Tasks and Languages"

10 / 10 papers shown

Uncovering inequalities in new knowledge learning by large language models across different languages

...

291

06 Mar 2025

SMAB: MAB based word Sensitivity Estimation Framework and its Applications in Adversarial Text GenerationNorth American Chapter of the Association for Computational Linguistics (NAACL), 2025

390

10 Feb 2025

PARIKSHA : A Large-Scale Investigation of Human-LLM Evaluator Agreement on Multilingual and Multi-Cultural Data

275

21 Jun 2024

METAL: Towards Multilingual Meta-Evaluation

220

02 Apr 2024

Are Large Language Model-based Evaluators the Solution to Scaling Up Multilingual Evaluation?Findings (Findings), 2023

306

14 Sep 2023

MEGA: Multilingual Evaluation of Generative AIConference on Empirical Methods in Natural Language Processing (EMNLP), 2023

...

548

348

22 Mar 2023

On the Calibration of Massively Multilingual Language ModelsConference on Empirical Methods in Natural Language Processing (EMNLP), 2022

217

21 Oct 2022

i-Code: An Integrative and Composable Multimodal Learning FrameworkAAAI Conference on Artificial Intelligence (AAAI), 2022

...

284

03 May 2022

Multilingual CheckList: Generation and Evaluation

309

24 Mar 2022

TyDi QA: A Benchmark for Information-Seeking Question Answering in Typologically Diverse LanguagesTransactions of the Association for Computational Linguistics (TACL), 2020

544

688

10 Mar 2020