ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2305.10284
  4. Cited By
Towards More Robust NLP System Evaluation: Handling Missing Scores in
  Benchmarks

Towards More Robust NLP System Evaluation: Handling Missing Scores in Benchmarks

17 May 2023
Anas Himmi
Ekhine Irurozki
Nathan Noiry
Stéphan Clémençon
Pierre Colombo
ArXivPDFHTML

Papers citing "Towards More Robust NLP System Evaluation: Handling Missing Scores in Benchmarks"

10 / 10 papers shown
Title
Inherent Trade-Offs between Diversity and Stability in Multi-Task
  Benchmarks
Inherent Trade-Offs between Diversity and Stability in Multi-Task Benchmarks
Guanhua Zhang
Moritz Hardt
27
7
0
02 May 2024
Revisiting Instruction Fine-tuned Model Evaluation to Guide Industrial
  Applications
Revisiting Instruction Fine-tuned Model Evaluation to Guide Industrial Applications
Manuel Faysse
Gautier Viaud
C´eline Hudelot
Pierre Colombo
17
9
0
21 Oct 2023
Toward Stronger Textual Attack Detectors
Toward Stronger Textual Attack Detectors
Pierre Colombo
Marine Picot
Nathan Noiry
Guillaume Staerman
Pablo Piantanida
20
5
0
21 Oct 2023
Transductive Learning for Textual Few-Shot Classification in API-based
  Embedding Models
Transductive Learning for Textual Few-Shot Classification in API-based Embedding Models
Pierre Colombo
Victor Pellegrain
Malik Boudiaf
Victor Storchan
Myriam Tami
Ismail Ben Ayed
C´eline Hudelot
Pablo Piantanida
14
8
0
21 Oct 2023
State-of-the-art generalisation research in NLP: A taxonomy and review
State-of-the-art generalisation research in NLP: A taxonomy and review
Dieuwke Hupkes
Mario Giulianelli
Verna Dankers
Mikel Artetxe
Yanai Elazar
...
Leila Khalatbari
Maria Ryskina
Rita Frieske
Ryan Cotterell
Zhijing Jin
103
91
0
06 Oct 2022
Beam Search with Bidirectional Strategies for Neural Response Generation
Beam Search with Bidirectional Strategies for Neural Response Generation
Pierre Colombo
Chouchang Yang
Giovanna Varni
Chloé Clavel
19
13
0
07 Oct 2021
Few-Shot Emotion Recognition in Conversation with Sequential
  Prototypical Networks
Few-Shot Emotion Recognition in Conversation with Sequential Prototypical Networks
Gaël Guibon
Matthieu Labeau
Hélène Flamein
Luce Lefeuvre
Chloé Clavel
34
33
0
20 Sep 2021
The GEM Benchmark: Natural Language Generation, its Evaluation and
  Metrics
The GEM Benchmark: Natural Language Generation, its Evaluation and Metrics
Sebastian Gehrmann
Tosin P. Adewumi
Karmanya Aggarwal
Pawan Sasanka Ammanamanchi
Aremu Anuoluwapo
...
Nishant Subramani
Wei-ping Xu
Diyi Yang
Akhila Yerukola
Jiawei Zhou
VLM
238
284
0
02 Feb 2021
MLQA: Evaluating Cross-lingual Extractive Question Answering
MLQA: Evaluating Cross-lingual Extractive Question Answering
Patrick Lewis
Barlas Oğuz
Ruty Rinott
Sebastian Riedel
Holger Schwenk
ELM
239
489
0
16 Oct 2019
GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language
  Understanding
GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding
Alex Jinpeng Wang
Amanpreet Singh
Julian Michael
Felix Hill
Omer Levy
Samuel R. Bowman
ELM
294
6,927
0
20 Apr 2018
1