What are the best systems? New perspectives on NLP Benchmarking

What are the best systems? New perspectives on NLP Benchmarking

8 February 2022

Ekhine Irurozki

Stéphan Clémençon

Papers citing "What are the best systems? New perspectives on NLP Benchmarking"

17 / 17 papers shown

Title
EuroBERT: Scaling Multilingual Encoders for European Languages Nicolas Boizard Hippolyte Gisserot-Boukhlef Duarte M. Alves André F. T. Martins Ayoub Hammal ... Maxime Peyrard Nuno M. Guerreiro Patrick Fernandes Ricardo Rei Pierre Colombo 79 1 0 07 Mar 2025
Navigating the Maze of Explainable AI: A Systematic Approach to Evaluating Methods and Metrics Lukas Klein Carsten T. Lüth U. Schlegel Till J. Bungert Mennatallah El-Assady Paul F. Jäger XAI ELM 34 1 0 03 Jan 2025
Toward Stronger Textual Attack Detectors Pierre Colombo Marine Picot Nathan Noiry Guillaume Staerman Pablo Piantanida 33 5 0 21 Oct 2023
Towards More Robust NLP System Evaluation: Handling Missing Scores in Benchmarks Anas Himmi Ekhine Irurozki Nathan Noiry Stéphan Clémençon Pierre Colombo 19 5 0 17 May 2023
Beyond Mahalanobis-Based Scores for Textual OOD Detection Pierre Colombo Eduardo Dadalto Camara Gomes Guillaume Staerman Nathan Noiry Pablo Piantanida OODD 22 5 0 24 Nov 2022
The Glass Ceiling of Automatic Evaluation in Natural Language Generation Pierre Colombo Maxime Peyrard Nathan Noiry Robert West Pablo Piantanida 27 11 0 31 Aug 2022
Of Human Criteria and Automatic Metrics: A Benchmark of the Evaluation of Story Generation Cyril Chhun Pierre Colombo Chloé Clavel Fabian M. Suchanek 49 50 0 24 Aug 2022
Learning Disentangled Textual Representations via Statistical Measures of Similarity Pierre Colombo Guillaume Staerman Nathan Noiry Pablo Piantanida FaML DRL 33 21 0 07 May 2022
NL-Augmenter: A Framework for Task-Sensitive Natural Language Augmentation Kaustubh D. Dhole Varun Gangal Sebastian Gehrmann Aadesh Gupta Zhenhao Li ... Tianbao Xie Usama Yaseen Michael A. Yee Jing Zhang Yue Zhang 169 86 0 06 Dec 2021
InfoLM: A New Metric to Evaluate Summarization & Data2Text Generation Pierre Colombo Chloe Clave Pablo Piantanida 30 41 0 02 Dec 2021
Multitask Prompted Training Enables Zero-Shot Task Generalization Victor Sanh Albert Webson Colin Raffel Stephen H. Bach Lintang Sutawika ... T. Bers Stella Biderman Leo Gao Thomas Wolf Alexander M. Rush LRM 208 1,654 0 15 Oct 2021
Beam Search with Bidirectional Strategies for Neural Response Generation Pierre Colombo Chouchang Yang Giovanna Varni Chloé Clavel 35 13 0 07 Oct 2021
FewNLU: Benchmarking State-of-the-Art Methods for Few-Shot Natural Language Understanding Yanan Zheng Jing Zhou Yujie Qian Ming Ding Chonghua Liao Jian Li Ruslan Salakhutdinov Jie Tang Sebastian Ruder Zhilin Yang ELM 204 29 0 27 Sep 2021
The GEM Benchmark: Natural Language Generation, its Evaluation and Metrics Sebastian Gehrmann Tosin P. Adewumi Karmanya Aggarwal Pawan Sasanka Ammanamanchi Aremu Anuoluwapo ... Nishant Subramani Wei-ping Xu Diyi Yang Akhila Yerukola Jiawei Zhou VLM 246 283 0 02 Feb 2021
MLQA: Evaluating Cross-lingual Extractive Question Answering Patrick Lewis Barlas Oğuz Ruty Rinott Sebastian Riedel Holger Schwenk ELM 242 490 0 16 Oct 2019
What you can cram into a single vector: Probing sentence embeddings for linguistic properties Alexis Conneau Germán Kruszewski Guillaume Lample Loïc Barrault Marco Baroni 199 879 0 03 May 2018
GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding Alex Jinpeng Wang Amanpreet Singh Julian Michael Felix Hill Omer Levy Samuel R. Bowman ELM 294 6,943 0 20 Apr 2018