Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2305.10284
Cited By
Towards More Robust NLP System Evaluation: Handling Missing Scores in Benchmarks
17 May 2023
Anas Himmi
Ekhine Irurozki
Nathan Noiry
Stéphan Clémençon
Pierre Colombo
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Towards More Robust NLP System Evaluation: Handling Missing Scores in Benchmarks"
10 / 10 papers shown
Title
Inherent Trade-Offs between Diversity and Stability in Multi-Task Benchmarks
Guanhua Zhang
Moritz Hardt
27
7
0
02 May 2024
Revisiting Instruction Fine-tuned Model Evaluation to Guide Industrial Applications
Manuel Faysse
Gautier Viaud
C´eline Hudelot
Pierre Colombo
17
9
0
21 Oct 2023
Toward Stronger Textual Attack Detectors
Pierre Colombo
Marine Picot
Nathan Noiry
Guillaume Staerman
Pablo Piantanida
20
5
0
21 Oct 2023
Transductive Learning for Textual Few-Shot Classification in API-based Embedding Models
Pierre Colombo
Victor Pellegrain
Malik Boudiaf
Victor Storchan
Myriam Tami
Ismail Ben Ayed
C´eline Hudelot
Pablo Piantanida
14
8
0
21 Oct 2023
State-of-the-art generalisation research in NLP: A taxonomy and review
Dieuwke Hupkes
Mario Giulianelli
Verna Dankers
Mikel Artetxe
Yanai Elazar
...
Leila Khalatbari
Maria Ryskina
Rita Frieske
Ryan Cotterell
Zhijing Jin
103
91
0
06 Oct 2022
Beam Search with Bidirectional Strategies for Neural Response Generation
Pierre Colombo
Chouchang Yang
Giovanna Varni
Chloé Clavel
19
13
0
07 Oct 2021
Few-Shot Emotion Recognition in Conversation with Sequential Prototypical Networks
Gaël Guibon
Matthieu Labeau
Hélène Flamein
Luce Lefeuvre
Chloé Clavel
34
33
0
20 Sep 2021
The GEM Benchmark: Natural Language Generation, its Evaluation and Metrics
Sebastian Gehrmann
Tosin P. Adewumi
Karmanya Aggarwal
Pawan Sasanka Ammanamanchi
Aremu Anuoluwapo
...
Nishant Subramani
Wei-ping Xu
Diyi Yang
Akhila Yerukola
Jiawei Zhou
VLM
238
284
0
02 Feb 2021
MLQA: Evaluating Cross-lingual Extractive Question Answering
Patrick Lewis
Barlas Oğuz
Ruty Rinott
Sebastian Riedel
Holger Schwenk
ELM
239
489
0
16 Oct 2019
GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding
Alex Jinpeng Wang
Amanpreet Singh
Julian Michael
Felix Hill
Omer Levy
Samuel R. Bowman
ELM
294
6,927
0
20 Apr 2018
1