Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2202.03799
Cited By
What are the best systems? New perspectives on NLP Benchmarking
8 February 2022
Pierre Colombo
Nathan Noiry
Ekhine Irurozki
Stéphan Clémençon
Re-assign community
ArXiv
PDF
HTML
Papers citing
"What are the best systems? New perspectives on NLP Benchmarking"
17 / 17 papers shown
Title
EuroBERT: Scaling Multilingual Encoders for European Languages
Nicolas Boizard
Hippolyte Gisserot-Boukhlef
Duarte M. Alves
André F. T. Martins
Ayoub Hammal
...
Maxime Peyrard
Nuno M. Guerreiro
Patrick Fernandes
Ricardo Rei
Pierre Colombo
79
1
0
07 Mar 2025
Navigating the Maze of Explainable AI: A Systematic Approach to Evaluating Methods and Metrics
Lukas Klein
Carsten T. Lüth
U. Schlegel
Till J. Bungert
Mennatallah El-Assady
Paul F. Jäger
XAI
ELM
34
1
0
03 Jan 2025
Toward Stronger Textual Attack Detectors
Pierre Colombo
Marine Picot
Nathan Noiry
Guillaume Staerman
Pablo Piantanida
33
5
0
21 Oct 2023
Towards More Robust NLP System Evaluation: Handling Missing Scores in Benchmarks
Anas Himmi
Ekhine Irurozki
Nathan Noiry
Stéphan Clémençon
Pierre Colombo
19
5
0
17 May 2023
Beyond Mahalanobis-Based Scores for Textual OOD Detection
Pierre Colombo
Eduardo Dadalto Camara Gomes
Guillaume Staerman
Nathan Noiry
Pablo Piantanida
OODD
22
5
0
24 Nov 2022
The Glass Ceiling of Automatic Evaluation in Natural Language Generation
Pierre Colombo
Maxime Peyrard
Nathan Noiry
Robert West
Pablo Piantanida
27
11
0
31 Aug 2022
Of Human Criteria and Automatic Metrics: A Benchmark of the Evaluation of Story Generation
Cyril Chhun
Pierre Colombo
Chloé Clavel
Fabian M. Suchanek
49
50
0
24 Aug 2022
Learning Disentangled Textual Representations via Statistical Measures of Similarity
Pierre Colombo
Guillaume Staerman
Nathan Noiry
Pablo Piantanida
FaML
DRL
33
21
0
07 May 2022
NL-Augmenter: A Framework for Task-Sensitive Natural Language Augmentation
Kaustubh D. Dhole
Varun Gangal
Sebastian Gehrmann
Aadesh Gupta
Zhenhao Li
...
Tianbao Xie
Usama Yaseen
Michael A. Yee
Jing Zhang
Yue Zhang
169
86
0
06 Dec 2021
InfoLM: A New Metric to Evaluate Summarization & Data2Text Generation
Pierre Colombo
Chloe Clave
Pablo Piantanida
30
41
0
02 Dec 2021
Multitask Prompted Training Enables Zero-Shot Task Generalization
Victor Sanh
Albert Webson
Colin Raffel
Stephen H. Bach
Lintang Sutawika
...
T. Bers
Stella Biderman
Leo Gao
Thomas Wolf
Alexander M. Rush
LRM
208
1,654
0
15 Oct 2021
Beam Search with Bidirectional Strategies for Neural Response Generation
Pierre Colombo
Chouchang Yang
Giovanna Varni
Chloé Clavel
35
13
0
07 Oct 2021
FewNLU: Benchmarking State-of-the-Art Methods for Few-Shot Natural Language Understanding
Yanan Zheng
Jing Zhou
Yujie Qian
Ming Ding
Chonghua Liao
Jian Li
Ruslan Salakhutdinov
Jie Tang
Sebastian Ruder
Zhilin Yang
ELM
204
29
0
27 Sep 2021
The GEM Benchmark: Natural Language Generation, its Evaluation and Metrics
Sebastian Gehrmann
Tosin P. Adewumi
Karmanya Aggarwal
Pawan Sasanka Ammanamanchi
Aremu Anuoluwapo
...
Nishant Subramani
Wei-ping Xu
Diyi Yang
Akhila Yerukola
Jiawei Zhou
VLM
246
283
0
02 Feb 2021
MLQA: Evaluating Cross-lingual Extractive Question Answering
Patrick Lewis
Barlas Oğuz
Ruty Rinott
Sebastian Riedel
Holger Schwenk
ELM
242
490
0
16 Oct 2019
What you can cram into a single vector: Probing sentence embeddings for linguistic properties
Alexis Conneau
Germán Kruszewski
Guillaume Lample
Loïc Barrault
Marco Baroni
199
879
0
03 May 2018
GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding
Alex Jinpeng Wang
Amanpreet Singh
Julian Michael
Felix Hill
Omer Levy
Samuel R. Bowman
ELM
294
6,943
0
20 Apr 2018
1