ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1603.08023
  4. Cited By
How NOT To Evaluate Your Dialogue System: An Empirical Study of
  Unsupervised Evaluation Metrics for Dialogue Response Generation

How NOT To Evaluate Your Dialogue System: An Empirical Study of Unsupervised Evaluation Metrics for Dialogue Response Generation

25 March 2016
Chia-Wei Liu
Ryan J. Lowe
Iulian Serban
Michael Noseworthy
Laurent Charlin
Joelle Pineau
ArXivPDFHTML

Papers citing "How NOT To Evaluate Your Dialogue System: An Empirical Study of Unsupervised Evaluation Metrics for Dialogue Response Generation"

50 / 220 papers shown
Title
Self-Supervised Contrastive Learning for Efficient User Satisfaction
  Prediction in Conversational Agents
Self-Supervised Contrastive Learning for Efficient User Satisfaction Prediction in Conversational Agents
Mohammad Kachuee
Hao Yuan
Young-Bum Kim
Sungjin Lee
19
25
0
21 Oct 2020
PARENTing via Model-Agnostic Reinforcement Learning to Correct
  Pathological Behaviors in Data-to-Text Generation
PARENTing via Model-Agnostic Reinforcement Learning to Correct Pathological Behaviors in Data-to-Text Generation
Clément Rebuffel
Laure Soulier
Geoffrey Scoutheeten
Patrick Gallinari
8
9
0
21 Oct 2020
Local Knowledge Powered Conversational Agents
Local Knowledge Powered Conversational Agents
Sashank Santhanam
Ming-Yu Liu
Raul Puri
M. Shoeybi
M. Patwary
Bryan Catanzaro
21
4
0
20 Oct 2020
Cue Me In: Content-Inducing Approaches to Interactive Story Generation
Cue Me In: Content-Inducing Approaches to Interactive Story Generation
Faeze Brahman
Alexandru Petrusca
Snigdha Chaturvedi
LRM
16
20
0
20 Oct 2020
Reformulating Unsupervised Style Transfer as Paraphrase Generation
Reformulating Unsupervised Style Transfer as Paraphrase Generation
Kalpesh Krishna
John Wieting
Mohit Iyyer
19
237
0
12 Oct 2020
Plan ahead: Self-Supervised Text Planning for Paragraph Completion Task
Plan ahead: Self-Supervised Text Planning for Paragraph Completion Task
Dongyeop Kang
Eduard H. Hovy
LRM
40
24
0
11 Oct 2020
Like hiking? You probably enjoy nature: Persona-grounded Dialog with
  Commonsense Expansions
Like hiking? You probably enjoy nature: Persona-grounded Dialog with Commonsense Expansions
Bodhisattwa Prasad Majumder
Harsh Jhamtani
Taylor Berg-Kirkpatrick
Julian McAuley
22
85
0
07 Oct 2020
Regularizing Dialogue Generation by Imitating Implicit Scenarios
Regularizing Dialogue Generation by Imitating Implicit Scenarios
Shaoxiong Feng
Xuancheng Ren
Hongshen Chen
Bin Sun
Kan Li
Xu Sun
18
20
0
05 Oct 2020
MIME: MIMicking Emotions for Empathetic Response Generation
MIME: MIMicking Emotions for Empathetic Response Generation
Navonil Majumder
Pengfei Hong
Shanshan Peng
Jiankun Lu
Deepanway Ghosal
Alexander Gelbukh
Rada Mihalcea
Soujanya Poria
23
200
0
04 Oct 2020
Predicting User Engagement Status for Online Evaluation of Intelligent
  Assistants
Predicting User Engagement Status for Online Evaluation of Intelligent Assistants
Rui Meng
Zhen Yue
A. Glass
13
2
0
01 Oct 2020
Pchatbot: A Large-Scale Dataset for Personalized Chatbot
Pchatbot: A Large-Scale Dataset for Personalized Chatbot
Hongjin Qian
Xiaohe Li
Hanxun Zhong
Yu Guo
Yueyuan Ma
Yutao Zhu
Zhanliang Liu
Zhanliang Liu
Ji-Rong Wen
38
43
0
28 Sep 2020
Enhancing Dialogue Generation via Multi-Level Contrastive Learning
Enhancing Dialogue Generation via Multi-Level Contrastive Learning
Xin Li
Piji Li
Yan Wang
Xiaojiang Liu
Wai Lam
26
5
0
19 Sep 2020
GLUCOSE: GeneraLized and COntextualized Story Explanations
GLUCOSE: GeneraLized and COntextualized Story Explanations
N. Mostafazadeh
Aditya Kalyanpur
Lori Moon
David W. Buchanan
Lauren Berkowitz
Or Biran
Jennifer Chu-Carroll
19
121
0
16 Sep 2020
UNION: An Unreferenced Metric for Evaluating Open-ended Story Generation
UNION: An Unreferenced Metric for Evaluating Open-ended Story Generation
Jian-Yu Guan
Minlie Huang
21
69
0
16 Sep 2020
A Survey of Evaluation Metrics Used for NLG Systems
A Survey of Evaluation Metrics Used for NLG Systems
Ananya B. Sai
Akash Kumar Mohankumar
Mitesh M. Khapra
ELM
30
228
0
27 Aug 2020
Opinion-aware Answer Generation for Review-driven Question Answering in
  E-Commerce
Opinion-aware Answer Generation for Review-driven Question Answering in E-Commerce
Yang Deng
Wenxuan Zhanng
Wai Lam
16
31
0
27 Aug 2020
CoreGen: Contextualized Code Representation Learning for Commit Message
  Generation
CoreGen: Contextualized Code Representation Learning for Commit Message Generation
L. Nie
Cuiyun Gao
Zhicong Zhong
Wai Lam
Yang Liu
Zenglin Xu
21
46
0
14 Jul 2020
Evaluation of Text Generation: A Survey
Evaluation of Text Generation: A Survey
Asli Celikyilmaz
Elizabeth Clark
Jianfeng Gao
ELM
LM&MA
19
376
0
26 Jun 2020
Open-Domain Conversational Agents: Current Progress, Open Problems, and
  Future Directions
Open-Domain Conversational Agents: Current Progress, Open Problems, and Future Directions
Stephen Roller
Y-Lan Boureau
Jason Weston
Antoine Bordes
Emily Dinan
...
Kurt Shuster
Eric Michael Smith
Arthur Szlam
Jack Urbanek
Mary Williamson
LLMAG
AI4CE
22
51
0
22 Jun 2020
Towards Unified Dialogue System Evaluation: A Comprehensive Analysis of
  Current Evaluation Protocols
Towards Unified Dialogue System Evaluation: A Comprehensive Analysis of Current Evaluation Protocols
Sarah E. Finch
Jinho D. Choi
ELM
23
67
0
10 Jun 2020
Report from the NSF Future Directions Workshop, Toward User-Oriented
  Agents: Research Directions and Challenges
Report from the NSF Future Directions Workshop, Toward User-Oriented Agents: Research Directions and Challenges
M. Eskénazi
Tiancheng Zhao
LLMAG
AI4TS
AI4CE
36
9
0
10 Jun 2020
Probing Neural Dialog Models for Conversational Understanding
Probing Neural Dialog Models for Conversational Understanding
Abdelrhman Saleh
Tovly Deutsch
Stephen Casper
Yonatan Belinkov
Stuart M. Shieber
21
13
0
07 Jun 2020
Beyond User Self-Reported Likert Scale Ratings: A Comparison Model for
  Automatic Dialog Evaluation
Beyond User Self-Reported Likert Scale Ratings: A Comparison Model for Automatic Dialog Evaluation
Weixin Liang
James Zou
Zhou Yu
ELM
34
33
0
21 May 2020
SueNes: A Weakly Supervised Approach to Evaluating Single-Document
  Summarization via Negative Sampling
SueNes: A Weakly Supervised Approach to Evaluating Single-Document Summarization via Negative Sampling
F. S. Bao
Hebi Li
Ge Luo
Minghui Qiu
Yinfei Yang
Youbiao He
Cen Chen
16
4
0
13 May 2020
Response-Anticipated Memory for On-Demand Knowledge Integration in
  Response Generation
Response-Anticipated Memory for On-Demand Knowledge Integration in Response Generation
Zhiliang Tian
Wei Bi
Dongkyu Lee
Lanqing Xue
Yiping Song
Xiaojiang Liu
N. Zhang
27
25
0
13 May 2020
History for Visual Dialog: Do we really need it?
History for Visual Dialog: Do we really need it?
Shubham Agarwal
Trung Bui
Joon-Young Lee
Ioannis Konstas
Verena Rieser
VLM
13
69
0
08 May 2020
FEQA: A Question Answering Evaluation Framework for Faithfulness
  Assessment in Abstractive Summarization
FEQA: A Question Answering Evaluation Framework for Faithfulness Assessment in Abstractive Summarization
Esin Durmus
He He
Mona T. Diab
HILM
6
384
0
07 May 2020
Learning an Unreferenced Metric for Online Dialogue Evaluation
Learning an Unreferenced Metric for Online Dialogue Evaluation
Koustuv Sinha
Prasanna Parthasarathi
Jasmine Wang
Ryan J. Lowe
William L. Hamilton
Joelle Pineau
OffRL
21
84
0
01 May 2020
KPQA: A Metric for Generative Question Answering Using Keyphrase Weights
KPQA: A Metric for Generative Question Answering Using Keyphrase Weights
Hwanhee Lee
Seunghyun Yoon
Franck Dernoncourt
Doo Soon Kim
Trung Bui
Joongbo Shin
Kyomin Jung
16
0
0
01 May 2020
Question Rewriting for Conversational Question Answering
Question Rewriting for Conversational Question Answering
Svitlana Vakulenko
Shayne Longpre
Zhucheng Tu
R. Anantha
20
172
0
30 Apr 2020
Learning to Update Natural Language Comments Based on Code Changes
Learning to Update Natural Language Comments Based on Code Changes
Sheena Panthaplackel
Pengyu Nie
Miloš Gligorić
Junyi Jessy Li
Raymond J. Mooney
27
63
0
25 Apr 2020
Experience Grounds Language
Experience Grounds Language
Yonatan Bisk
Ari Holtzman
Jesse Thomason
Jacob Andreas
Yoshua Bengio
...
Angeliki Lazaridou
Jonathan May
Aleksandr Nisnevich
Nicolas Pinto
Joseph P. Turian
19
350
0
21 Apr 2020
BLEURT: Learning Robust Metrics for Text Generation
BLEURT: Learning Robust Metrics for Text Generation
Thibault Sellam
Dipanjan Das
Ankur P. Parikh
46
1,442
0
09 Apr 2020
Asking and Answering Questions to Evaluate the Factual Consistency of
  Summaries
Asking and Answering Questions to Evaluate the Factual Consistency of Summaries
Alex Jinpeng Wang
Kyunghyun Cho
M. Lewis
HILM
10
470
0
08 Apr 2020
A Survey on Conversational Recommender Systems
A Survey on Conversational Recommender Systems
Dietmar Jannach
A. Manzoor
Wanling Cai
Li Chen
13
403
0
01 Apr 2020
XPersona: Evaluating Multilingual Personalized Chatbot
XPersona: Evaluating Multilingual Personalized Chatbot
Zhaojiang Lin
Zihan Liu
Genta Indra Winata
Samuel Cahyawijaya
Andrea Madotto
Yejin Bang
Etsuko Ishii
Pascale Fung
45
57
0
17 Mar 2020
Posterior-GAN: Towards Informative and Coherent Response Generation with
  Posterior Generative Adversarial Network
Posterior-GAN: Towards Informative and Coherent Response Generation with Posterior Generative Adversarial Network
Shaoxiong Feng
Hongshen Chen
Kan Li
Dawei Yin
GAN
49
25
0
04 Mar 2020
A Neural Topical Expansion Framework for Unstructured Persona-oriented
  Dialogue Generation
A Neural Topical Expansion Framework for Unstructured Persona-oriented Dialogue Generation
Minghong Xu
Piji Li
Haoran Yang
Pengjie Ren
Z. Ren
Zhumin Chen
Jun Ma
18
31
0
06 Feb 2020
Towards a Human-like Open-Domain Chatbot
Towards a Human-like Open-Domain Chatbot
Daniel De Freitas
Minh-Thang Luong
David R. So
Jamie Hall
Noah Fiedel
...
Zi Yang
Apoorv Kulshreshtha
Gaurav Nemade
Yifeng Lu
Quoc V. Le
30
923
0
27 Jan 2020
Paraphrase Generation with Latent Bag of Words
Paraphrase Generation with Latent Bag of Words
Yao Fu
Yansong Feng
John P. Cunningham
BDL
25
91
0
07 Jan 2020
Going Beneath the Surface: Evaluating Image Captioning for
  Grammaticality, Truthfulness and Diversity
Going Beneath the Surface: Evaluating Image Captioning for Grammaticality, Truthfulness and Diversity
Huiyuan Xie
Tom Sherborne
A. Kuhnle
Ann A. Copestake
DiffM
19
9
0
19 Dec 2019
Plug and Play Language Models: A Simple Approach to Controlled Text
  Generation
Plug and Play Language Models: A Simple Approach to Controlled Text Generation
Sumanth Dathathri
Andrea Madotto
Janice Lan
Jane Hung
Eric Frank
Piero Molino
J. Yosinski
Rosanne Liu
KELM
26
937
0
04 Dec 2019
Task-Oriented Dialog Systems that Consider Multiple Appropriate
  Responses under the Same Context
Task-Oriented Dialog Systems that Consider Multiple Appropriate Responses under the Same Context
Yichi Zhang
Zhijian Ou
Zhou Yu
19
182
0
24 Nov 2019
Social Bias Frames: Reasoning about Social and Power Implications of
  Language
Social Bias Frames: Reasoning about Social and Power Implications of Language
Maarten Sap
Saadia Gabriel
Lianhui Qin
Dan Jurafsky
Noah A. Smith
Yejin Choi
28
483
0
10 Nov 2019
Automatic Reminiscence Therapy for Dementia
Automatic Reminiscence Therapy for Dementia
Mariona Carós
M. Garolera
P. Radeva
Xavier Giró-i-Nieto
21
40
0
25 Oct 2019
Analyzing the Forgetting Problem in the Pretrain-Finetuning of Dialogue
  Response Models
Analyzing the Forgetting Problem in the Pretrain-Finetuning of Dialogue Response Models
Tianxing He
Jun Liu
Kyunghyun Cho
Myle Ott
Bing-Quan Liu
James R. Glass
Fuchun Peng
CLL
29
9
0
16 Oct 2019
Learning from Fact-checkers: Analysis and Generation of Fact-checking
  Language
Learning from Fact-checkers: Analysis and Generation of Fact-checking Language
Nguyen Vo
Kyumin Lee
9
68
0
05 Oct 2019
DyKgChat: Benchmarking Dialogue Generation Grounding on Dynamic
  Knowledge Graphs
DyKgChat: Benchmarking Dialogue Generation Grounding on Dynamic Knowledge Graphs
Yi-Lin Tuan
Yun-Nung (Vivian) Chen
Hung-yi Lee
18
71
0
01 Oct 2019
Do Massively Pretrained Language Models Make Better Storytellers?
Do Massively Pretrained Language Models Make Better Storytellers?
A. See
Aneesh S. Pappu
Rohun Saxena
Akhila Yerukola
Christopher D. Manning
37
166
0
24 Sep 2019
Counterfactual Story Reasoning and Generation
Counterfactual Story Reasoning and Generation
Lianhui Qin
Antoine Bosselut
Ari Holtzman
Chandra Bhagavatula
Elizabeth Clark
Yejin Choi
LRM
11
140
0
09 Sep 2019
Previous
12345
Next