ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1707.06875
  4. Cited By
Why We Need New Evaluation Metrics for NLG

Why We Need New Evaluation Metrics for NLG

21 July 2017
Jekaterina Novikova
Ondrej Dusek
A. C. Curry
Verena Rieser
ArXivPDFHTML

Papers citing "Why We Need New Evaluation Metrics for NLG"

36 / 86 papers shown
Title
Learning to Rationalize for Nonmonotonic Reasoning with Distant
  Supervision
Learning to Rationalize for Nonmonotonic Reasoning with Distant Supervision
Faeze Brahman
Vered Shwartz
Rachel Rudinger
Yejin Choi
LRM
15
42
0
14 Dec 2020
Exploring Question-Specific Rewards for Generating Deep Questions
Exploring Question-Specific Rewards for Generating Deep Questions
Yuxi Xie
Liangming Pan
Dongzhe Wang
Min-Yen Kan
Yansong Feng
46
27
0
02 Nov 2020
Synthetic Data Augmentation for Zero-Shot Cross-Lingual Question
  Answering
Synthetic Data Augmentation for Zero-Shot Cross-Lingual Question Answering
Arij Riabi
Thomas Scialom
Rachel Keraron
Benoît Sagot
Djamé Seddah
Jacopo Staiano
142
52
0
23 Oct 2020
Self-Supervised Contrastive Learning for Efficient User Satisfaction
  Prediction in Conversational Agents
Self-Supervised Contrastive Learning for Efficient User Satisfaction Prediction in Conversational Agents
Mohammad Kachuee
Hao Yuan
Young-Bum Kim
Sungjin Lee
19
25
0
21 Oct 2020
PARENTing via Model-Agnostic Reinforcement Learning to Correct
  Pathological Behaviors in Data-to-Text Generation
PARENTing via Model-Agnostic Reinforcement Learning to Correct Pathological Behaviors in Data-to-Text Generation
Clément Rebuffel
Laure Soulier
Geoffrey Scoutheeten
Patrick Gallinari
8
9
0
21 Oct 2020
Reformulating Unsupervised Style Transfer as Paraphrase Generation
Reformulating Unsupervised Style Transfer as Paraphrase Generation
Kalpesh Krishna
John Wieting
Mohit Iyyer
19
237
0
12 Oct 2020
What Have We Achieved on Text Summarization?
What Have We Achieved on Text Summarization?
Dandan Huang
Leyang Cui
Sen Yang
Guangsheng Bao
Kun Wang
Jun Xie
Yue Zhang
29
109
0
09 Oct 2020
Toward Stance-based Personas for Opinionated Dialogues
Toward Stance-based Personas for Opinionated Dialogues
Thomas Scialom
Serra Sinem Tekiroğlu
Jacopo Staiano
Marco Guerini
20
9
0
07 Oct 2020
Shimon the Rapper: A Real-Time System for Human-Robot Interactive Rap
  Battles
Shimon the Rapper: A Real-Time System for Human-Robot Interactive Rap Battles
Richard J. Savery
Lisa Zahray
Gil Weinberg
4
17
0
19 Sep 2020
GLUCOSE: GeneraLized and COntextualized Story Explanations
GLUCOSE: GeneraLized and COntextualized Story Explanations
N. Mostafazadeh
Aditya Kalyanpur
Lori Moon
David W. Buchanan
Lauren Berkowitz
Or Biran
Jennifer Chu-Carroll
19
121
0
16 Sep 2020
A Survey of Evaluation Metrics Used for NLG Systems
A Survey of Evaluation Metrics Used for NLG Systems
Ananya B. Sai
Akash Kumar Mohankumar
Mitesh M. Khapra
ELM
25
228
0
27 Aug 2020
Evaluation of Text Generation: A Survey
Evaluation of Text Generation: A Survey
Asli Celikyilmaz
Elizabeth Clark
Jianfeng Gao
ELM
LM&MA
19
376
0
26 Jun 2020
Open-Domain Conversational Agents: Current Progress, Open Problems, and
  Future Directions
Open-Domain Conversational Agents: Current Progress, Open Problems, and Future Directions
Stephen Roller
Y-Lan Boureau
Jason Weston
Antoine Bordes
Emily Dinan
...
Kurt Shuster
Eric Michael Smith
Arthur Szlam
Jack Urbanek
Mary Williamson
LLMAG
AI4CE
20
51
0
22 Jun 2020
Beyond User Self-Reported Likert Scale Ratings: A Comparison Model for
  Automatic Dialog Evaluation
Beyond User Self-Reported Likert Scale Ratings: A Comparison Model for Automatic Dialog Evaluation
Weixin Liang
James Y. Zou
Zhou Yu
ELM
28
33
0
21 May 2020
History for Visual Dialog: Do we really need it?
History for Visual Dialog: Do we really need it?
Shubham Agarwal
Trung Bui
Joon-Young Lee
Ioannis Konstas
Verena Rieser
VLM
11
69
0
08 May 2020
FEQA: A Question Answering Evaluation Framework for Faithfulness
  Assessment in Abstractive Summarization
FEQA: A Question Answering Evaluation Framework for Faithfulness Assessment in Abstractive Summarization
Esin Durmus
He He
Mona T. Diab
HILM
6
384
0
07 May 2020
Exploring Content Selection in Summarization of Novel Chapters
Exploring Content Selection in Summarization of Novel Chapters
Faisal Ladhak
Bryan Li
Yaser Al-Onaizan
Kathleen McKeown
61
35
0
04 May 2020
Knowledge Graph-Augmented Abstractive Summarization with Semantic-Driven
  Cloze Reward
Knowledge Graph-Augmented Abstractive Summarization with Semantic-Driven Cloze Reward
Luyang Huang
Lingfei Wu
Lu Wang
RALM
29
161
0
03 May 2020
Stay Hungry, Stay Focused: Generating Informative and Specific Questions
  in Information-Seeking Conversations
Stay Hungry, Stay Focused: Generating Informative and Specific Questions in Information-Seeking Conversations
Peng Qi
Yuhao Zhang
Christopher D. Manning
14
38
0
30 Apr 2020
A Human Evaluation of AMR-to-English Generation Systems
A Human Evaluation of AMR-to-English Generation Systems
Emma Manning
Shira Wein
Nathan Schneider
30
18
0
14 Apr 2020
BLEURT: Learning Robust Metrics for Text Generation
BLEURT: Learning Robust Metrics for Text Generation
Thibault Sellam
Dipanjan Das
Ankur P. Parikh
46
1,439
0
09 Apr 2020
Towards a Human-like Open-Domain Chatbot
Towards a Human-like Open-Domain Chatbot
Daniel De Freitas
Minh-Thang Luong
David R. So
Jamie Hall
Noah Fiedel
...
Zi Yang
Apoorv Kulshreshtha
Gaurav Nemade
Yifeng Lu
Quoc V. Le
28
923
0
27 Jan 2020
Paraphrase Generation with Latent Bag of Words
Paraphrase Generation with Latent Bag of Words
Yao Fu
Yansong Feng
John P. Cunningham
BDL
25
91
0
07 Jan 2020
How Decoding Strategies Affect the Verifiability of Generated Text
How Decoding Strategies Affect the Verifiability of Generated Text
Luca Massarelli
Fabio Petroni
Aleksandra Piktus
Myle Ott
Tim Rocktaschel
Vassilis Plachouras
Fabrizio Silvestri
Sebastian Riedel
23
50
0
09 Nov 2019
Do Massively Pretrained Language Models Make Better Storytellers?
Do Massively Pretrained Language Models Make Better Storytellers?
A. See
Aneesh S. Pappu
Rohun Saxena
Akhila Yerukola
Christopher D. Manning
37
166
0
24 Sep 2019
ACUTE-EVAL: Improved Dialogue Evaluation with Optimized Questions and
  Multi-turn Comparisons
ACUTE-EVAL: Improved Dialogue Evaluation with Optimized Questions and Multi-turn Comparisons
Margaret Li
Jason Weston
Stephen Roller
21
175
0
06 Sep 2019
Handling Divergent Reference Texts when Evaluating Table-to-Text
  Generation
Handling Divergent Reference Texts when Evaluating Table-to-Text Generation
Bhuwan Dhingra
Manaal Faruqui
Ankur P. Parikh
Ming-Wei Chang
Dipanjan Das
William W. Cohen
18
194
0
03 Jun 2019
Triple-to-Text: Converting RDF Triples into High-Quality Natural
  Languages via Optimizing an Inverse KL Divergence
Triple-to-Text: Converting RDF Triples into High-Quality Natural Languages via Optimizing an Inverse KL Divergence
Yaoming Zhu
Juncheng Wan
Zhiming Zhou
Liheng Chen
Lin Qiu
Weinan Zhang
Xin Jiang
Yong Yu
20
27
0
25 May 2019
Towards Coherent and Engaging Spoken Dialog Response Generation Using
  Automatic Conversation Evaluators
Towards Coherent and Engaging Spoken Dialog Response Generation Using Automatic Conversation Evaluators
Sanghyun Yi
Rahul Goel
Chandra Khatri
Alessandra Cervone
Tagyoung Chung
Behnam Hedayatnia
Anu Venkatesh
Raefer Gabriel
Dilek Z. Hakkani-Tür
20
60
0
30 Apr 2019
Evaluating the State-of-the-Art of End-to-End Natural Language
  Generation: The E2E NLG Challenge
Evaluating the State-of-the-Art of End-to-End Natural Language Generation: The E2E NLG Challenge
Ondrej Dusek
Jekaterina Novikova
Verena Rieser
ELM
32
231
0
23 Jan 2019
Sequence-to-Sequence Models for Data-to-Text Natural Language
  Generation: Word- vs. Character-based Processing and Output Diversity
Sequence-to-Sequence Models for Data-to-Text Natural Language Generation: Word- vs. Character-based Processing and Output Diversity
Glorianna Jagfeld
Sabrina Jenne
Ngoc Thang Vu
AIMat
33
24
0
11 Oct 2018
Findings of the E2E NLG Challenge
Findings of the E2E NLG Challenge
Ondrej Dusek
Jekaterina Novikova
Verena Rieser
18
115
0
02 Oct 2018
The price of debiasing automatic metrics in natural language evaluation
The price of debiasing automatic metrics in natural language evaluation
Arun Tejasvi Chaganty
Stephen Mussmann
Percy Liang
11
113
0
06 Jul 2018
RankME: Reliable Human Ratings for Natural Language Generation
RankME: Reliable Human Ratings for Natural Language Generation
Jekaterina Novikova
Ondrej Dusek
Verena Rieser
ALM
19
108
0
15 Mar 2018
Zero-Shot Question Generation from Knowledge Graphs for Unseen
  Predicates and Entity Types
Zero-Shot Question Generation from Knowledge Graphs for Unseen Predicates and Entity Types
Hady ElSahar
Christophe Gravier
F. Laforest
BDL
22
80
0
19 Feb 2018
Adversarial Evaluation of Dialogue Models
Adversarial Evaluation of Dialogue Models
Anjuli Kannan
Oriol Vinyals
AAML
ALM
133
76
0
27 Jan 2017
Previous
12