v1v2v3 (latest)

GEMv2: Multilingual NLG Benchmarking in a Single Line of Code

Conference on Empirical Methods in Natural Language Processing (EMNLP), 2022

22 June 2022

Alexandros Papangelis

Aman Madaan

Angelina McMillan-Major

Bingsheng Yao

Esin Durmus

Jekaterina Novikova

Laura Perez-Beltrachini

Leonardo F. R. Ribeiro

Pawan Sasanka Ammanamanchi

Simon Mille

ArXiv (abs)PDF HTML Github (16★)

Papers citing "GEMv2: Multilingual NLG Benchmarking in a Single Line of Code"

35 / 35 papers shown

Survey of NLU Benchmarks Diagnosing Linguistic Phenomena: Why not Standardize Diagnostics Benchmarks?

288

27 Jul 2025

Improving Model Evaluation using SMART Filtering of Benchmark DatasetsNorth American Chapter of the Association for Computational Linguistics (NAACL), 2024

798

26 Oct 2024

LexSumm and LexT5: Benchmarking and Modeling Legal Summarization Tasks in English

563

12 Oct 2024

Teaching LLMs to Abstain across Languages via Multilingual Feedback

Vidhisha Balachandran

Sunayana Sitaram

Yulia Tsvetkov

668

22 Jun 2024

The BiGGen Bench: A Principled Benchmark for Fine-grained Evaluation of Language Models with Language ModelsNorth American Chapter of the Association for Computational Linguistics (NAACL), 2024

...

Sean Welleck

Graham Neubig

Moontae Lee

Kyungjae Lee

Minjoon Seo

ELM ALM LM&MA

551

09 Jun 2024

Prometheus 2: An Open Source Language Model Specialized in Evaluating Other Language ModelsConference on Empirical Methods in Natural Language Processing (EMNLP), 2024

Graham Neubig

470

391

02 May 2024

InspectorRAGet: An Introspection Platform for RAG Evaluation

Yannis Katsis

170

26 Apr 2024

Understanding Cross-Lingual Alignment -- A SurveyAnnual Meeting of the Association for Computational Linguistics (ACL), 2024

Katharina Hämmerl

Jindvrich Libovický

Kangyang Luo

369

09 Apr 2024

Aya Dataset: An Open-Access Collection for Multilingual Instruction TuningAnnual Meeting of the Association for Computational Linguistics (ACL), 2024

...

437

195

09 Feb 2024

Cheetah: Natural Language Generation for 517 African LanguagesAnnual Meeting of the Association for Computational Linguistics (ACL), 2024

Ife Adebara

AbdelRahim Elmadany

Muhammad Abdul-Mageed

408

02 Jan 2024

Evaluating General-Purpose AI with Psychometrics

Xiting Wang

Liming Jiang

Jose Hernandez-Orallo

Xing Xie

281

25 Oct 2023

Which Prompts Make The Difference? Data Prioritization For Efficient Human LLM Evaluation

359

22 Oct 2023

NusaWrites: Constructing High-Quality Corpora for Underrepresented and Extremely Low-Resource LanguagesInternational Joint Conference on Natural Language Processing (IJCNLP), 2023

...

355

19 Sep 2023

Dolphin: A Challenging and Diverse Benchmark for Arabic NLGConference on Empirical Methods in Natural Language Processing (EMNLP), 2023

El Moatez Billah Nagoudi

AbdelRahim Elmadany

Ahmed Oumar El-Shangiti

Muhammad Abdul-Mageed

LM&MA

377

24 May 2023

InstructAlign: High-and-Low Resource Language Alignment via Continual Crosslingual Instruction Tuning

272

23 May 2023

Cross-Lingual Supervision improves Large Language Models Pre-training

229

19 May 2023

Towards More Robust NLP System Evaluation: Handling Missing Scores in BenchmarksConference on Empirical Methods in Natural Language Processing (EMNLP), 2023

396

17 May 2023

A Systematic Study of Knowledge Distillation for Natural Language Generation with Pseudo-Target TrainingAnnual Meeting of the Association for Computational Linguistics (ACL), 2023

362

03 May 2023

Bridging the Gap: A Survey on Integrating (Human) Feedback for Natural Language Generation

Patrick Fernandes

Aman Madaan

Emmy Liu

António Farinhas

Pedro Henrique Martins

...

José G. C. de Souza

Shuyan Zhou

Tongshuang Wu

Graham Neubig

Marcely Zanon Boito

ALM

385

01 May 2023

Evaluation for ChangeAnnual Meeting of the Association for Computational Linguistics (ACL), 2022

Rishi Bommasani

ELM

282

20 Dec 2022

Evaluating Human-Language Model Interaction

Esin Durmus

...

439

121

19 Dec 2022

NusaCrowd: Open Source Initiative for Indonesian NLP ResourcesAnnual Meeting of the Association for Computational Linguistics (ACL), 2022

...

555

19 Dec 2022

CiteBench: A benchmark for Scientific Citation Text GenerationConference on Empirical Methods in Natural Language Processing (EMNLP), 2022

Martin Funkquist

Ilia Kuznetsov

Yufang Hou

Iryna Gurevych

322

19 Dec 2022

Revisiting the Gold Standard: Grounding Summarization Evaluation with Robust Human EvaluationAnnual Meeting of the Association for Computational Linguistics (ACL), 2022

...

402

164

15 Dec 2022

A Major Obstacle for NLP Research: Let's Talk about Time Allocation!Conference on Empirical Methods in Natural Language Processing (EMNLP), 2022

Katharina Kann

Shiran Dudy

Arya D. McCarthy

283

30 Nov 2022

Measuring the Measuring Tools: An Automatic Evaluation of Semantic Metrics for Text CorporaIEEE Games Entertainment Media Conference (GEM), 2022

215

29 Nov 2022

Operationalizing Specifications, In Addition to Test Sets for Evaluating Constrained Generative Models

288

19 Nov 2022

BLOOM: A 176B-Parameter Open-Access Multilingual Language Model

Angela Fan

...

1.0K

2,879

09 Nov 2022

CLSE: Corpus of Linguistically Significant EntitiesIEEE Games Entertainment Media Conference (GEM), 2022

A. Chuklin

Justin Zhao

Mihir Kale

329

04 Nov 2022

Finding Memo: Extractive Memorization in Constrained Sequence Generation TasksConference on Empirical Methods in Natural Language Processing (EMNLP), 2022

Vikas Raunak

Arul Menezes

193

24 Oct 2022

BanglaParaphrase: A High-Quality Bangla Paraphrase Dataset

292

11 Oct 2022

Petals: Collaborative Inference and Fine-tuning of Large ModelsAnnual Meeting of the Association for Computational Linguistics (ACL), 2022

Tim Dettmers

277

110

02 Sep 2022

RealTime QA: What's the Answer Right Now?Neural Information Processing Systems (NeurIPS), 2022

Keisuke Sakaguchi

Yejin Choi

531

277

27 Jul 2022

NL-Augmenter: A Framework for Task-Sensitive Natural Language Augmentation

...

Yue Zhang

491

06 Dec 2021

Control Prefixes for Parameter-Efficient Text Generation

Jordan Clive

Kris Cao

Marek Rei

330

15 Oct 2021