ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2208.14585
  4. Cited By
The Glass Ceiling of Automatic Evaluation in Natural Language Generation

The Glass Ceiling of Automatic Evaluation in Natural Language Generation

31 August 2022
Pierre Colombo
Maxime Peyrard
Nathan Noiry
Robert West
Pablo Piantanida
ArXivPDFHTML

Papers citing "The Glass Ceiling of Automatic Evaluation in Natural Language Generation"

11 / 11 papers shown
Title
Agree to Disagree? A Meta-Evaluation of LLM Misgendering
Agree to Disagree? A Meta-Evaluation of LLM Misgendering
Arjun Subramonian
Vagrant Gautam
Preethi Seshadri
Dietrich Klakow
Kai-Wei Chang
Yizhou Sun
27
1
0
23 Apr 2025
Benchmarking Abstractive Summarisation: A Dataset of Human-authored Summaries of Norwegian News Articles
Benchmarking Abstractive Summarisation: A Dataset of Human-authored Summaries of Norwegian News Articles
Samia Touileb
Vladislav Mikhailov
Marie Kroka
Lilja Øvrelid
Erik Velldal
39
3
0
13 Jan 2025
Mitigating the Impact of Reference Quality on Evaluation of
  Summarization Systems with Reference-Free Metrics
Mitigating the Impact of Reference Quality on Evaluation of Summarization Systems with Reference-Free Metrics
Théo Gigant
Camille Guinaudeau
Marc Decombas
Frédéric Dufaux
45
1
0
08 Oct 2024
Do Language Models Enjoy Their Own Stories? Prompting Large Language
  Models for Automatic Story Evaluation
Do Language Models Enjoy Their Own Stories? Prompting Large Language Models for Automatic Story Evaluation
Cyril Chhun
Fabian M. Suchanek
Chloé Clavel
LRM
42
13
0
22 May 2024
MERA: A Comprehensive LLM Evaluation in Russian
MERA: A Comprehensive LLM Evaluation in Russian
Alena Fenogenova
Artem Chervyakov
Nikita Martynov
Anastasia Kozlova
Maria Tikhonova
...
Nikita Savushkin
Polina Mikhailova
Denis Dimitrov
Alexander Panchenko
Sergey Markov
ELM
28
10
0
09 Jan 2024
Toward Stronger Textual Attack Detectors
Toward Stronger Textual Attack Detectors
Pierre Colombo
Marine Picot
Nathan Noiry
Guillaume Staerman
Pablo Piantanida
33
5
0
21 Oct 2023
Towards More Robust NLP System Evaluation: Handling Missing Scores in
  Benchmarks
Towards More Robust NLP System Evaluation: Handling Missing Scores in Benchmarks
Anas Himmi
Ekhine Irurozki
Nathan Noiry
Stéphan Clémençon
Pierre Colombo
19
5
0
17 May 2023
The Current State of Summarization
The Current State of Summarization
Fabian Retkowski
18
6
0
08 May 2023
A Meta-Evaluation of Faithfulness Metrics for Long-Form Hospital-Course
  Summarization
A Meta-Evaluation of Faithfulness Metrics for Long-Form Hospital-Course Summarization
Griffin Adams
Jason Zucker
Noémie Elhadad
46
22
0
07 Mar 2023
Rainproof: An Umbrella To Shield Text Generators From
  Out-Of-Distribution Data
Rainproof: An Umbrella To Shield Text Generators From Out-Of-Distribution Data
Maxime Darrin
Pablo Piantanida
Pierre Colombo
OODD
32
12
0
18 Dec 2022
Beam Search with Bidirectional Strategies for Neural Response Generation
Beam Search with Bidirectional Strategies for Neural Response Generation
Pierre Colombo
Chouchang Yang
Giovanna Varni
Chloé Clavel
35
13
0
07 Oct 2021
1