A Comprehensive Assessment of Dialog Evaluation Metrics

7 June 2021

Papers citing "A Comprehensive Assessment of Dialog Evaluation Metrics"

31 / 81 papers shown

Title
Distribution Aware Metrics for Conditional Natural Language Generation David M. Chan Yiming Ni David A. Ross Sudheendra Vijayanarasimhan Austin Myers John F. Canny 35 4 0 15 Sep 2022
Open-Domain Dialog Evaluation using Follow-Ups Likelihood Maxime De Bruyn Ehsan Lotfi Jeska Buhmann Walter Daelemans 26 9 0 12 Sep 2022
SelF-Eval: Self-supervised Fine-grained Dialogue Evaluation Longxuan Ma Ziyu Zhuang Weinan Zhang Mingda Li Ting Liu 19 4 0 17 Aug 2022
Relevance in Dialogue: Is Less More? An Empirical Comparison of Existing Metrics, and a Novel Simple Metric Ian Berlot-Attwell Frank Rudzicz 15 1 0 03 Jun 2022
InstructDial: Improving Zero and Few-shot Generalization in Dialogue through Instruction Tuning Prakhar Gupta Cathy Jiao Yi-Ting Yeh Shikib Mehri M. Eskénazi Jeffrey P. Bigham ALM 36 47 0 25 May 2022
What should I Ask: A Knowledge-driven Approach for Follow-up Questions Generation in Conversational Surveys Yubin Ge Ziang Xiao Jana Diesner Heng Ji Karrie Karahalios Hari Sundaram 43 12 0 23 May 2022
CORAL: Contextual Response Retrievability Loss Function for Training Dialog Generation Models Bishal Santra Ravi Ghadia Manish Gupta Pawan Goyal OffRL 15 0 0 21 May 2022
Empathetic Conversational Systems: A Review of Current Advances, Gaps, and Opportunities Aravind Sesagiri Raamkumar Yinping Yang 12 28 0 09 May 2022
Spurious Correlations in Reference-Free Evaluation of Text Generation Esin Durmus Faisal Ladhak Tatsunori Hashimoto 14 30 0 21 Apr 2022
TRUE: Re-evaluating Factual Consistency Evaluation Or Honovich Roee Aharoni Jonathan Herzig Hagai Taitelbaum Doron Kukliansy Vered Cohen Thomas Scialom Idan Szpektor Avinatan Hassidim Yossi Matias HILM 27 3 0 11 Apr 2022
Quality Assurance of Generative Dialog Models in an Evolving Conversational Agent Used for Swedish Language Practice Markus Borg J. Bengtsson Harald Österling Alexander Hagelborn Isabella Gagner Piotr Tomaszewski 4 1 0 29 Mar 2022
What is wrong with you?: Leveraging User Sentiment for Automatic Dialog Evaluation Sarik Ghazarian Behnam Hedayatnia Alexandros Papangelis Yang Liu Dilek Z. Hakkani-Tür 22 19 0 25 Mar 2022
Report from the NSF Future Directions Workshop on Automatic Evaluation of Dialog: Research Directions and Challenges Shikib Mehri Jinho Choi L. F. D’Haro Jan Deriu M. Eskénazi ... David Traum Yi-Ting Yeh Zhou Yu Yizhe Zhang Chen Zhang 28 21 0 18 Mar 2022
DEAM: Dialogue Coherence Evaluation using AMR-based Semantic Manipulations Sarik Ghazarian Nuan Wen Aram Galstyan Nanyun Peng 11 34 0 18 Mar 2022
Probing the Robustness of Trained Metrics for Conversational Dialogue Systems Jan Deriu Don Tuggener Pius von Daniken Mark Cieliebak AAML 8 9 0 28 Feb 2022
FlowEval: A Consensus-Based Dialogue Evaluation Framework Using Segment Act Flows Jianqiao Zhao Yanyang Li Wanyu Du Yangfeng Ji Dong Yu M. Lyu Liwei Wang 12 4 0 14 Feb 2022
Mental Health Assessment for the Chatbots Yong Shan Jinchao Zhang Zekang Li Yang Feng Jie Zhou AI4MH 11 3 0 14 Jan 2022
Human Evaluation of Conversations is an Open Problem: comparing the sensitivity of various methods for evaluating dialogue agents Eric Michael Smith Orion Hsu Rebecca Qian Stephen Roller Y-Lan Boureau Jason Weston 19 66 0 12 Jan 2022
MDD-Eval: Self-Training on Augmented Data for Multi-Domain Dialogue Evaluation Chen Zhang L. F. D’Haro Thomas Friedrichs Haizhou Li ELM 9 18 0 14 Dec 2021
User Response and Sentiment Prediction for Automatic Dialogue Evaluation Sarik Ghazarian Behnam Hedayatnia Alexandros Papangelis Yang Liu Dilek Z. Hakkani-Tür 9 3 0 16 Nov 2021
Automatic Evaluation and Moderation of Open-domain Dialogue Systems Chen Zhang João Sedoc L. F. D’Haro Rafael E. Banchs Alexander I. Rudnicky 20 36 0 03 Nov 2021
Modeling Performance in Open-Domain Dialogue with PARADISE M. Walker Colin Harmon James Graupera Davan Harrison S. Whittaker 12 7 0 21 Oct 2021
Think Before You Speak: Explicitly Generating Implicit Commonsense Knowledge for Response Generation Pei Zhou Karthik Gopalakrishnan Behnam Hedayatnia Seokhwan Kim Jay Pujara Xiang Ren Yang Liu Dilek Z. Hakkani-Tür 34 40 0 16 Oct 2021
Jurassic is (almost) All You Need: Few-Shot Meaning-to-Text Generation for Open-Domain Dialogue Lena Reed Cecilia Li Angela Ramirez Liren Wu M. Walker 21 7 0 15 Oct 2021
Exploring Dense Retrieval for Dialogue Response Selection Tian Lan Deng Cai Yan Wang Yixuan Su Heyan Huang Xian-Ling Mao 109 16 0 13 Oct 2021
Investigating the Impact of Pre-trained Language Models on Dialog Evaluation Chen Zhang L. F. D’Haro Yiming Chen Thomas Friedrichs Haizhou Li 13 5 0 05 Oct 2021
Improving Stack Overflow question title generation with copying enhanced CodeBERT model and bi-modal information Fengji Zhang Xiao Yu J. Keung Fuyang Li Zhiwen Xie Zhen Yang Caoyuan Ma Zhimin Zhang 35 25 0 27 Sep 2021
Evaluating Attribution in Dialogue Systems: The BEGIN Benchmark Nouha Dziri Hannah Rashkin Tal Linzen David Reitter ALM 185 79 0 30 Apr 2021
Dialogue-adaptive Language Model Pre-training From Quality Estimation Junlong Li Zhuosheng Zhang Hai Zhao OffRL 19 12 0 10 Sep 2020
GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding Alex Jinpeng Wang Amanpreet Singh Julian Michael Felix Hill Omer Levy Samuel R. Bowman ELM 294 6,950 0 20 Apr 2018
Efficient Estimation of Word Representations in Vector Space Tomáš Mikolov Kai Chen G. Corrado J. Dean 3DV 228 31,244 0 16 Jan 2013