Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1803.05928
Cited By
RankME: Reliable Human Ratings for Natural Language Generation
15 March 2018
Jekaterina Novikova
Ondrej Dusek
Verena Rieser
ALM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"RankME: Reliable Human Ratings for Natural Language Generation"
25 / 25 papers shown
Title
The Viability of Crowdsourcing for RAG Evaluation
Lukas Gienapp
Tim Hagen
Maik Frobe
Matthias Hagen
Benno Stein
Martin Potthast
Harrisen Scells
21
0
0
22 Apr 2025
TICKing All the Boxes: Generated Checklists Improve LLM Evaluation and Generation
Jonathan Cook
Tim Rocktaschel
Jakob Foerster
Dennis Aumiller
Alex Wang
ALM
31
10
0
04 Oct 2024
DHP Benchmark: Are LLMs Good NLG Evaluators?
Yicheng Wang
Jiayi Yuan
Yu-Neng Chuang
Zhuoer Wang
Yingchi Liu
Mark Cusick
Param Kulkarni
Zhengping Ji
Yasser Ibrahim
Xia Hu
LM&MA
ELM
47
3
0
25 Aug 2024
AI-Assisted Human Evaluation of Machine Translation
Vilém Zouhar
Tom Kocmi
Mrinmaya Sachan
35
5
0
18 Jun 2024
Context Does Matter: Implications for Crowdsourced Evaluation Labels in Task-Oriented Dialogue Systems
Clemencia Siro
Mohammad Aliannejadi
Maarten de Rijke
27
3
0
15 Apr 2024
How Much Annotation is Needed to Compare Summarization Models?
Chantal Shaib
Joe Barrow
Alexa F. Siu
Byron C. Wallace
A. Nenkova
45
2
0
28 Feb 2024
Towards More Robust NLP System Evaluation: Handling Missing Scores in Benchmarks
Anas Himmi
Ekhine Irurozki
Nathan Noiry
Stéphan Clémençon
Pierre Colombo
28
5
0
17 May 2023
NaturalAdversaries: Can Naturalistic Adversaries Be as Effective as Artificial Adversaries?
Saadia Gabriel
Hamid Palangi
Yejin Choi
AAML
37
1
0
08 Nov 2022
The Glass Ceiling of Automatic Evaluation in Natural Language Generation
Pierre Colombo
Maxime Peyrard
Nathan Noiry
Robert West
Pablo Piantanida
41
11
0
31 Aug 2022
Innovations in Neural Data-to-text Generation: A Survey
Mandar Sharma
Ajay K. Gogineni
Naren Ramakrishnan
29
10
0
25 Jul 2022
Report from the NSF Future Directions Workshop on Automatic Evaluation of Dialog: Research Directions and Challenges
Shikib Mehri
Jinho Choi
L. F. D’Haro
Jan Deriu
M. Eskénazi
...
David Traum
Yi-Ting Yeh
Zhou Yu
Yizhe Zhang
Chen Zhang
30
21
0
18 Mar 2022
Achieving Reliable Human Assessment of Open-Domain Dialogue Systems
Tianbo Ji
Yvette Graham
Gareth J. F. Jones
Chenyang Lyu
Qun Liu
ALM
29
39
0
11 Mar 2022
Czech Grammar Error Correction with a Large and Diverse Corpus
Jakub Náplava
Milan Straka
Jana Straková
Alexandr Rosen
25
32
0
14 Jan 2022
A Survey of Controllable Text Generation using Transformer-based Pre-trained Language Models
Hanqing Zhang
Haolin Song
Shaoyu Li
Ming Zhou
Dawei Song
40
213
0
14 Jan 2022
AutoChart: A Dataset for Chart-to-Text Generation Task
Jiawen Zhu
Jinye Ran
Roy Ka-Wei Lee
Kenny Choo
Zhi Li
25
15
0
16 Aug 2021
Anticipating Safety Issues in E2E Conversational AI: Framework and Tooling
Emily Dinan
Gavin Abercrombie
A. S. Bergman
Shannon L. Spruit
Dirk Hovy
Y-Lan Boureau
Verena Rieser
34
105
0
07 Jul 2021
Is GPT-3 Text Indistinguishable from Human Text? Scarecrow: A Framework for Scrutinizing Machine Text
Yao Dou
Maxwell Forbes
Rik Koncel-Kedziorski
Noah A. Smith
Yejin Choi
DeLMO
8
126
0
02 Jul 2021
All That's 'Human' Is Not Gold: Evaluating Human Evaluation of Generated Text
Elizabeth Clark
Tal August
Sofia Serrano
Nikita Haduong
Suchin Gururangan
Noah A. Smith
DeLMO
34
394
0
30 Jun 2021
A Gold Standard Methodology for Evaluating Accuracy in Data-To-Text Systems
Craig Thomson
Ehud Reiter
14
51
0
08 Nov 2020
Local Knowledge Powered Conversational Agents
Sashank Santhanam
Wei Ping
Raul Puri
M. Shoeybi
M. Patwary
Bryan Catanzaro
19
4
0
20 Oct 2020
Evaluation of Text Generation: A Survey
Asli Celikyilmaz
Elizabeth Clark
Jianfeng Gao
ELM
LM&MA
19
376
0
26 Jun 2020
A Crowd-based Evaluation of Abuse Response Strategies in Conversational Agents
A. C. Curry
Verena Rieser
20
31
0
10 Sep 2019
Evaluating the State-of-the-Art of End-to-End Natural Language Generation: The E2E NLG Challenge
Ondrej Dusek
Jekaterina Novikova
Verena Rieser
ELM
32
231
0
23 Jan 2019
Findings of the E2E NLG Challenge
Ondrej Dusek
Jekaterina Novikova
Verena Rieser
18
115
0
02 Oct 2018
Efficient Online Scalar Annotation with Bounded Support
Keisuke Sakaguchi
Benjamin Van Durme
11
45
0
04 Jun 2018
1