ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1603.08023
  4. Cited By
How NOT To Evaluate Your Dialogue System: An Empirical Study of
  Unsupervised Evaluation Metrics for Dialogue Response Generation

How NOT To Evaluate Your Dialogue System: An Empirical Study of Unsupervised Evaluation Metrics for Dialogue Response Generation

25 March 2016
Chia-Wei Liu
Ryan J. Lowe
Iulian Serban
Michael Noseworthy
Laurent Charlin
Joelle Pineau
ArXivPDFHTML

Papers citing "How NOT To Evaluate Your Dialogue System: An Empirical Study of Unsupervised Evaluation Metrics for Dialogue Response Generation"

50 / 220 papers shown
Title
State-of-the-art in Open-domain Conversational AI: A Survey
State-of-the-art in Open-domain Conversational AI: A Survey
Tosin P. Adewumi
F. Liwicki
Marcus Liwicki
26
15
0
02 May 2022
COSPLAY: Concept Set Guided Personalized Dialogue Generation Across Both
  Party Personas
COSPLAY: Concept Set Guided Personalized Dialogue Generation Across Both Party Personas
Chengshi Xu
Pijian Li
Wei Wang
Haoran Yang
Siyun Wang
Chuangbai Xiao
25
26
0
02 May 2022
What is wrong with you?: Leveraging User Sentiment for Automatic Dialog
  Evaluation
What is wrong with you?: Leveraging User Sentiment for Automatic Dialog Evaluation
Sarik Ghazarian
Behnam Hedayatnia
Alexandros Papangelis
Yang Liu
Dilek Z. Hakkani-Tür
30
19
0
25 Mar 2022
Towards Large-Scale Interpretable Knowledge Graph Reasoning for Dialogue
  Systems
Towards Large-Scale Interpretable Knowledge Graph Reasoning for Dialogue Systems
Yi-Lin Tuan
Sajjad Beygi
Maryam Fazel-Zarandi
Qiaozi Gao
Alessandra Cervone
William Yang Wang
LRM
21
23
0
20 Mar 2022
Report from the NSF Future Directions Workshop on Automatic Evaluation
  of Dialog: Research Directions and Challenges
Report from the NSF Future Directions Workshop on Automatic Evaluation of Dialog: Research Directions and Challenges
Shikib Mehri
Jinho Choi
L. F. D’Haro
Jan Deriu
M. Eskénazi
...
David Traum
Yi-Ting Yeh
Zhou Yu
Yizhe Zhang
Chen Zhang
30
21
0
18 Mar 2022
Conversational Recommendation: A Grand AI Challenge
Conversational Recommendation: A Grand AI Challenge
Dietmar Jannach
L. Chen
26
18
0
17 Mar 2022
Rethinking and Refining the Distinct Metric
Rethinking and Refining the Distinct Metric
Siyang Liu
Sahand Sabour
Yinhe Zheng
Pei Ke
Xiaoyan Zhu
Minlie Huang
28
10
0
28 Feb 2022
Towards Personalized Answer Generation in E-Commerce via
  Multi-Perspective Preference Modeling
Towards Personalized Answer Generation in E-Commerce via Multi-Perspective Preference Modeling
Yang Deng
Yaliang Li
Wenxuan Zhang
Bolin Ding
W. Lam
27
36
0
27 Dec 2021
Understanding and Improving the Exemplar-based Generation for
  Open-domain Conversation
Understanding and Improving the Exemplar-based Generation for Open-domain Conversation
Seungju Han
Beomsu Kim
Seokjun Seo
Enkhbayar Erdenee
Buru Chang
30
3
0
13 Dec 2021
Am I Me or You? State-of-the-Art Dialogue Models Cannot Maintain an
  Identity
Am I Me or You? State-of-the-Art Dialogue Models Cannot Maintain an Identity
Kurt Shuster
Jack Urbanek
Arthur Szlam
Jason Weston
HILM
16
24
0
10 Dec 2021
CO-STAR: Conceptualisation of Stereotypes for Analysis and Reasoning
CO-STAR: Conceptualisation of Stereotypes for Analysis and Reasoning
Teyun Kwon
Anandha Gopalan
25
2
0
01 Dec 2021
Learning to Predict Persona Information forDialogue Personalization
  without Explicit Persona Description
Learning to Predict Persona Information forDialogue Personalization without Explicit Persona Description
Wangchunshu Zhou
Qifei Li
Chenle Li
13
9
0
30 Nov 2021
Automatic Evaluation and Moderation of Open-domain Dialogue Systems
Automatic Evaluation and Moderation of Open-domain Dialogue Systems
Chen Zhang
João Sedoc
L. F. D’Haro
Rafael E. Banchs
Alexander I. Rudnicky
22
36
0
03 Nov 2021
EmpBot: A T5-based Empathetic Chatbot focusing on Sentiments
EmpBot: A T5-based Empathetic Chatbot focusing on Sentiments
Emmanouil Zaranis
Georgios Paraskevopoulos
Athanasios Katsamanis
Alexandros Potamianos
30
9
0
30 Oct 2021
A Plug-and-Play Method for Controlled Text Generation
A Plug-and-Play Method for Controlled Text Generation
Damian Pascual
Béni Egressy
Clara Meister
Ryan Cotterell
Roger Wattenhofer
19
89
0
20 Sep 2021
Conversational Multi-Hop Reasoning with Neural Commonsense Knowledge and
  Symbolic Logic Rules
Conversational Multi-Hop Reasoning with Neural Commonsense Knowledge and Symbolic Logic Rules
Forough Arabshahi
Jennifer Lee
Antoine Bosselut
Yejin Choi
Tom Michael Mitchell
LRM
21
17
0
17 Sep 2021
Identifying Untrustworthy Samples: Data Filtering for Open-domain
  Dialogues with Bayesian Optimization
Identifying Untrustworthy Samples: Data Filtering for Open-domain Dialogues with Bayesian Optimization
Lei Shen
Haolan Zhan
Xin Shen
Hongshen Chen
Xiaofang Zhao
Xiao-Dan Zhu
32
17
0
14 Sep 2021
Explain Me the Painting: Multi-Topic Knowledgeable Art Description
  Generation
Explain Me the Painting: Multi-Topic Knowledgeable Art Description Generation
Zechen Bai
Yuta Nakashima
Noa Garcia
68
43
0
13 Sep 2021
CEM: Commonsense-aware Empathetic Response Generation
CEM: Commonsense-aware Empathetic Response Generation
Sahand Sabour
Chujie Zheng
Minlie Huang
28
149
0
13 Sep 2021
Generating Personalized Dialogue via Multi-Task Meta-Learning
Generating Personalized Dialogue via Multi-Task Meta-Learning
Jing Yang Lee
Kong Aik Lee
W. Gan
25
14
0
07 Aug 2021
How to Evaluate Your Dialogue Models: A Review of Approaches
How to Evaluate Your Dialogue Models: A Review of Approaches
Xinmeng Li
Wansen Wu
Long Qin
Quanjun Yin
ELM
30
8
0
03 Aug 2021
WeaSuL: Weakly Supervised Dialogue Policy Learning: Reward Estimation
  for Multi-turn Dialogue
WeaSuL: Weakly Supervised Dialogue Policy Learning: Reward Estimation for Multi-turn Dialogue
Anant Khandelwal
OffRL
21
6
0
01 Aug 2021
An Evaluation of Generative Pre-Training Model-based Therapy Chatbot for
  Caregivers
An Evaluation of Generative Pre-Training Model-based Therapy Chatbot for Caregivers
Lu Wang
Munif Ishad Mujib
Jake Williams
G. Demiris
Jina Huh-Yoo
AI4MH
27
32
0
28 Jul 2021
Increasing Faithfulness in Knowledge-Grounded Dialogue with Controllable
  Features
Increasing Faithfulness in Knowledge-Grounded Dialogue with Controllable Features
Hannah Rashkin
David Reitter
Gaurav Singh Tomar
Dipanjan Das
167
101
0
14 Jul 2021
Productivity, Portability, Performance: Data-Centric Python
Productivity, Portability, Performance: Data-Centric Python
Yiheng Wang
Yao Zhang
Yanzhang Wang
Yan Wan
Jiao Wang
Zhongyuan Wu
Yuhao Yang
Bowen She
54
94
0
01 Jul 2021
All That's 'Human' Is Not Gold: Evaluating Human Evaluation of Generated
  Text
All That's 'Human' Is Not Gold: Evaluating Human Evaluation of Generated Text
Elizabeth Clark
Tal August
Sofia Serrano
Nikita Haduong
Suchin Gururangan
Noah A. Smith
DeLMO
36
394
0
30 Jun 2021
Synthesizing Adversarial Negative Responses for Robust Response Ranking
  and Evaluation
Synthesizing Adversarial Negative Responses for Robust Response Ranking and Evaluation
Prakhar Gupta
Yulia Tsvetkov
Jeffrey P. Bigham
34
22
0
10 Jun 2021
A Comprehensive Assessment of Dialog Evaluation Metrics
A Comprehensive Assessment of Dialog Evaluation Metrics
Yi-Ting Yeh
M. Eskénazi
Shikib Mehri
30
104
0
07 Jun 2021
GTM: A Generative Triple-Wise Model for Conversational Question
  Generation
GTM: A Generative Triple-Wise Model for Conversational Question Generation
Lei Shen
Fandong Meng
Jinchao Zhang
Yang Feng
Jie Zhou
19
13
0
07 Jun 2021
Generating Relevant and Coherent Dialogue Responses using Self-separated
  Conditional Variational AutoEncoders
Generating Relevant and Coherent Dialogue Responses using Self-separated Conditional Variational AutoEncoders
Bin Sun
Shaoxiong Feng
Yiwei Li
Jiamou Liu
Kan Li
13
31
0
07 Jun 2021
Emotion-aware Chat Machine: Automatic Emotional Response Generation for
  Human-like Emotional Interaction
Emotion-aware Chat Machine: Automatic Emotional Response Generation for Human-like Emotional Interaction
Wei Wei
Jiayi Liu
Xian-Ling Mao
G. Guo
Feida Zhu
Pan Zhou
Yuchong Hu
45
56
0
06 Jun 2021
DynaEval: Unifying Turn and Dialogue Level Evaluation
DynaEval: Unifying Turn and Dialogue Level Evaluation
Chen Zhang
Yiming Chen
L. F. D’Haro
Yan Zhang
Thomas Friedrichs
Grandee Lee
Haizhou Li
24
73
0
02 Jun 2021
HERALD: An Annotation Efficient Method to Detect User Disengagement in
  Social Conversations
HERALD: An Annotation Efficient Method to Detect User Disengagement in Social Conversations
Weixin Liang
Kai-Hui Liang
Zhou Yu
34
15
0
01 Jun 2021
Empathetic Dialog Generation with Fine-Grained Intents
Empathetic Dialog Generation with Fine-Grained Intents
Yubo Xie
P. Pu
VLM
19
26
0
14 May 2021
Semi-Supervised Variational Reasoning for Medical Dialogue Generation
Semi-Supervised Variational Reasoning for Medical Dialogue Generation
Dongdong Li
Z. Ren
Pengjie Ren
Zhumin Chen
M. Fan
Jun Ma
Maarten de Rijke
BDL
DRL
OffRL
MedIm
24
48
0
13 May 2021
Recent Advances in Deep Learning Based Dialogue Systems: A Systematic
  Survey
Recent Advances in Deep Learning Based Dialogue Systems: A Systematic Survey
Jinjie Ni
Tom Young
Vlad Pandelea
Fuzhao Xue
Erik Cambria
54
268
0
10 May 2021
LEGOEval: An Open-Source Toolkit for Dialogue System Evaluation via
  Crowdsourcing
LEGOEval: An Open-Source Toolkit for Dialogue System Evaluation via Crowdsourcing
Yu Li
Josh Arnold
Feifan Yan
Weiyan Shi
Zhou Yu
ELM
26
11
0
05 May 2021
Meta-evaluation of Conversational Search Evaluation Metrics
Meta-evaluation of Conversational Search Evaluation Metrics
Zeyang Liu
K. Zhou
Max L. Wilson
ELM
24
17
0
27 Apr 2021
Code Structure Guided Transformer for Source Code Summarization
Code Structure Guided Transformer for Source Code Summarization
Shuzheng Gao
Cuiyun Gao
Yulan He
Jichuan Zeng
L. Nie
Xin Xia
Michael R. Lyu
22
96
0
19 Apr 2021
Improving Question Answering Model Robustness with Synthetic Adversarial
  Data Generation
Improving Question Answering Model Robustness with Synthetic Adversarial Data Generation
Max Bartolo
Tristan Thrush
Robin Jia
Sebastian Riedel
Pontus Stenetorp
Douwe Kiela
AAML
17
103
0
18 Apr 2021
Action-Based Conversations Dataset: A Corpus for Building More In-Depth
  Task-Oriented Dialogue Systems
Action-Based Conversations Dataset: A Corpus for Building More In-Depth Task-Oriented Dialogue Systems
Derek Chen
Howard Chen
Yi Yang
A. Lin
Zhou Yu
17
65
0
01 Apr 2021
Advances and Challenges in Conversational Recommender Systems: A Survey
Advances and Challenges in Conversational Recommender Systems: A Survey
Chongming Gao
Wenqiang Lei
Xiangnan He
Maarten de Rijke
Tat-Seng Chua
136
273
0
23 Jan 2021
Towards Facilitating Empathic Conversations in Online Mental Health
  Support: A Reinforcement Learning Approach
Towards Facilitating Empathic Conversations in Online Mental Health Support: A Reinforcement Learning Approach
Ashish Sharma
Inna Wanyin Lin
Adam S. Miner
David C. Atkins
Tim Althoff
AI4MH
25
138
0
19 Jan 2021
CRSLab: An Open-Source Toolkit for Building Conversational Recommender
  System
CRSLab: An Open-Source Toolkit for Building Conversational Recommender System
Kun Zhou
Xiaolei Wang
Yuanhang Zhou
Chenzhang Shang
Yuan Cheng
Wayne Xin Zhao
Yaliang Li
Ji-Rong Wen
27
63
0
04 Jan 2021
Writing Polishment with Simile: Task, Dataset and A Neural Approach
Writing Polishment with Simile: Task, Dataset and A Neural Approach
Jiayi Zhang
Zhi Cui
Xiaoqiang Xia
Yalong Guo
Yanran Li
Chen Wei
Jianwei Cui
20
17
0
15 Dec 2020
Target Guided Emotion Aware Chat Machine
Target Guided Emotion Aware Chat Machine
Wei Wei
Jiayi Liu
Xian-Ling Mao
G. Guo
Feida Zhu
Pan Zhou
Yuchong Hu
Shanshan Feng
22
24
0
15 Nov 2020
Refer, Reuse, Reduce: Generating Subsequent References in Visual and
  Conversational Contexts
Refer, Reuse, Reduce: Generating Subsequent References in Visual and Conversational Contexts
Ece Takmaz
Mario Giulianelli
Sandro Pezzelle
Arabella J. Sinclair
Raquel Fernández
15
26
0
09 Nov 2020
Exploring Question-Specific Rewards for Generating Deep Questions
Exploring Question-Specific Rewards for Generating Deep Questions
Yuxi Xie
Liangming Pan
Dongzhe Wang
Min-Yen Kan
Yansong Feng
48
27
0
02 Nov 2020
Deconstruct to Reconstruct a Configurable Evaluation Metric for
  Open-Domain Dialogue Systems
Deconstruct to Reconstruct a Configurable Evaluation Metric for Open-Domain Dialogue Systems
Vitou Phy
Yang Zhao
Akiko Aizawa
14
55
0
01 Nov 2020
PowerTransformer: Unsupervised Controllable Revision for Biased Language
  Correction
PowerTransformer: Unsupervised Controllable Revision for Biased Language Correction
Xinyao Ma
Maarten Sap
Hannah Rashkin
Yejin Choi
30
73
0
26 Oct 2020
Previous
12345
Next