ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1803.05928
  4. Cited By
RankME: Reliable Human Ratings for Natural Language Generation

RankME: Reliable Human Ratings for Natural Language Generation

15 March 2018
Jekaterina Novikova
Ondrej Dusek
Verena Rieser
    ALM
ArXivPDFHTML

Papers citing "RankME: Reliable Human Ratings for Natural Language Generation"

25 / 25 papers shown
Title
The Viability of Crowdsourcing for RAG Evaluation
The Viability of Crowdsourcing for RAG Evaluation
Lukas Gienapp
Tim Hagen
Maik Frobe
Matthias Hagen
Benno Stein
Martin Potthast
Harrisen Scells
21
0
0
22 Apr 2025
TICKing All the Boxes: Generated Checklists Improve LLM Evaluation and
  Generation
TICKing All the Boxes: Generated Checklists Improve LLM Evaluation and Generation
Jonathan Cook
Tim Rocktaschel
Jakob Foerster
Dennis Aumiller
Alex Wang
ALM
31
10
0
04 Oct 2024
DHP Benchmark: Are LLMs Good NLG Evaluators?
DHP Benchmark: Are LLMs Good NLG Evaluators?
Yicheng Wang
Jiayi Yuan
Yu-Neng Chuang
Zhuoer Wang
Yingchi Liu
Mark Cusick
Param Kulkarni
Zhengping Ji
Yasser Ibrahim
Xia Hu
LM&MA
ELM
47
3
0
25 Aug 2024
AI-Assisted Human Evaluation of Machine Translation
AI-Assisted Human Evaluation of Machine Translation
Vilém Zouhar
Tom Kocmi
Mrinmaya Sachan
35
5
0
18 Jun 2024
Context Does Matter: Implications for Crowdsourced Evaluation Labels in
  Task-Oriented Dialogue Systems
Context Does Matter: Implications for Crowdsourced Evaluation Labels in Task-Oriented Dialogue Systems
Clemencia Siro
Mohammad Aliannejadi
Maarten de Rijke
27
3
0
15 Apr 2024
How Much Annotation is Needed to Compare Summarization Models?
How Much Annotation is Needed to Compare Summarization Models?
Chantal Shaib
Joe Barrow
Alexa F. Siu
Byron C. Wallace
A. Nenkova
45
2
0
28 Feb 2024
Towards More Robust NLP System Evaluation: Handling Missing Scores in
  Benchmarks
Towards More Robust NLP System Evaluation: Handling Missing Scores in Benchmarks
Anas Himmi
Ekhine Irurozki
Nathan Noiry
Stéphan Clémençon
Pierre Colombo
28
5
0
17 May 2023
NaturalAdversaries: Can Naturalistic Adversaries Be as Effective as
  Artificial Adversaries?
NaturalAdversaries: Can Naturalistic Adversaries Be as Effective as Artificial Adversaries?
Saadia Gabriel
Hamid Palangi
Yejin Choi
AAML
37
1
0
08 Nov 2022
The Glass Ceiling of Automatic Evaluation in Natural Language Generation
The Glass Ceiling of Automatic Evaluation in Natural Language Generation
Pierre Colombo
Maxime Peyrard
Nathan Noiry
Robert West
Pablo Piantanida
41
11
0
31 Aug 2022
Innovations in Neural Data-to-text Generation: A Survey
Innovations in Neural Data-to-text Generation: A Survey
Mandar Sharma
Ajay K. Gogineni
Naren Ramakrishnan
29
10
0
25 Jul 2022
Report from the NSF Future Directions Workshop on Automatic Evaluation
  of Dialog: Research Directions and Challenges
Report from the NSF Future Directions Workshop on Automatic Evaluation of Dialog: Research Directions and Challenges
Shikib Mehri
Jinho Choi
L. F. D’Haro
Jan Deriu
M. Eskénazi
...
David Traum
Yi-Ting Yeh
Zhou Yu
Yizhe Zhang
Chen Zhang
30
21
0
18 Mar 2022
Achieving Reliable Human Assessment of Open-Domain Dialogue Systems
Achieving Reliable Human Assessment of Open-Domain Dialogue Systems
Tianbo Ji
Yvette Graham
Gareth J. F. Jones
Chenyang Lyu
Qun Liu
ALM
29
39
0
11 Mar 2022
Czech Grammar Error Correction with a Large and Diverse Corpus
Czech Grammar Error Correction with a Large and Diverse Corpus
Jakub Náplava
Milan Straka
Jana Straková
Alexandr Rosen
25
32
0
14 Jan 2022
A Survey of Controllable Text Generation using Transformer-based
  Pre-trained Language Models
A Survey of Controllable Text Generation using Transformer-based Pre-trained Language Models
Hanqing Zhang
Haolin Song
Shaoyu Li
Ming Zhou
Dawei Song
40
213
0
14 Jan 2022
AutoChart: A Dataset for Chart-to-Text Generation Task
AutoChart: A Dataset for Chart-to-Text Generation Task
Jiawen Zhu
Jinye Ran
Roy Ka-Wei Lee
Kenny Choo
Zhi Li
25
15
0
16 Aug 2021
Anticipating Safety Issues in E2E Conversational AI: Framework and
  Tooling
Anticipating Safety Issues in E2E Conversational AI: Framework and Tooling
Emily Dinan
Gavin Abercrombie
A. S. Bergman
Shannon L. Spruit
Dirk Hovy
Y-Lan Boureau
Verena Rieser
34
105
0
07 Jul 2021
Is GPT-3 Text Indistinguishable from Human Text? Scarecrow: A Framework
  for Scrutinizing Machine Text
Is GPT-3 Text Indistinguishable from Human Text? Scarecrow: A Framework for Scrutinizing Machine Text
Yao Dou
Maxwell Forbes
Rik Koncel-Kedziorski
Noah A. Smith
Yejin Choi
DeLMO
8
126
0
02 Jul 2021
All That's 'Human' Is Not Gold: Evaluating Human Evaluation of Generated
  Text
All That's 'Human' Is Not Gold: Evaluating Human Evaluation of Generated Text
Elizabeth Clark
Tal August
Sofia Serrano
Nikita Haduong
Suchin Gururangan
Noah A. Smith
DeLMO
34
394
0
30 Jun 2021
A Gold Standard Methodology for Evaluating Accuracy in Data-To-Text
  Systems
A Gold Standard Methodology for Evaluating Accuracy in Data-To-Text Systems
Craig Thomson
Ehud Reiter
14
51
0
08 Nov 2020
Local Knowledge Powered Conversational Agents
Local Knowledge Powered Conversational Agents
Sashank Santhanam
Wei Ping
Raul Puri
M. Shoeybi
M. Patwary
Bryan Catanzaro
19
4
0
20 Oct 2020
Evaluation of Text Generation: A Survey
Evaluation of Text Generation: A Survey
Asli Celikyilmaz
Elizabeth Clark
Jianfeng Gao
ELM
LM&MA
19
376
0
26 Jun 2020
A Crowd-based Evaluation of Abuse Response Strategies in Conversational
  Agents
A Crowd-based Evaluation of Abuse Response Strategies in Conversational Agents
A. C. Curry
Verena Rieser
20
31
0
10 Sep 2019
Evaluating the State-of-the-Art of End-to-End Natural Language
  Generation: The E2E NLG Challenge
Evaluating the State-of-the-Art of End-to-End Natural Language Generation: The E2E NLG Challenge
Ondrej Dusek
Jekaterina Novikova
Verena Rieser
ELM
32
231
0
23 Jan 2019
Findings of the E2E NLG Challenge
Findings of the E2E NLG Challenge
Ondrej Dusek
Jekaterina Novikova
Verena Rieser
18
115
0
02 Oct 2018
Efficient Online Scalar Annotation with Bounded Support
Efficient Online Scalar Annotation with Bounded Support
Keisuke Sakaguchi
Benjamin Van Durme
11
45
0
04 Jun 2018
1