USR: An Unsupervised and Reference Free Evaluation Metric for Dialog Generation

Annual Meeting of the Association for Computational Linguistics (ACL), 2020

1 May 2020

Shikib Mehri

M. Eskénazi

ArXiv (abs)PDF HTML

Papers citing "USR: An Unsupervised and Reference Free Evaluation Metric for Dialog Generation"

50 / 161 papers shown

FineD-Eval: Fine-grained Automatic Dialogue-Level EvaluationConference on Empirical Methods in Natural Language Processing (EMNLP), 2022

Chen Zhang

L. F. D’Haro

Qiquan Zhang

Thomas Friedrichs

Haizhou Li

185

25 Oct 2022

On the Effectiveness of Automated Metrics for Text Generation SystemsConference on Empirical Methods in Natural Language Processing (EMNLP), 2022

235

24 Oct 2022

On the Limitations of Reference-Free Evaluations of Generated TextConference on Empirical Methods in Natural Language Processing (EMNLP), 2022

Daniel Deutsch

Rotem Dror

Dan Roth

284

22 Oct 2022

Deepfake Text Detection: Limitations and OpportunitiesIEEE Symposium on Security and Privacy (IEEE S&P), 2022

Jiameng Pu

Zain Sarwar

Sifat Muhammad Abdullah

196

17 Oct 2022

Towards a Unified Multi-Dimensional Evaluator for Text GenerationConference on Empirical Methods in Natural Language Processing (EMNLP), 2022

Yang Liu

Heng Ji

250

329

13 Oct 2022

Evaluating Agent Interactions Through Episodic Knowledge Graphs

Selene Báez Santamaría

Piek Vossen

T. Baier

218

22 Sep 2022

Open-Domain Dialog Evaluation using Follow-Ups LikelihoodInternational Conference on Computational Linguistics (COLING), 2022

198

12 Sep 2022

Dialogue Evaluation with Offline Reinforcement LearningSIGDIAL Conferences (SIGDIAL), 2022

169

02 Sep 2022

The Glass Ceiling of Automatic Evaluation in Natural Language GenerationInternational Joint Conference on Natural Language Processing (IJCNLP), 2022

359

31 Aug 2022

SelF-Eval: Self-supervised Fine-grained Dialogue EvaluationInternational Conference on Computational Linguistics (COLING), 2022

355

17 Aug 2022

Interactive Evaluation of Dialog Track at DSTC9International Conference on Language Resources and Evaluation (LREC), 2022

197

28 Jul 2022

MME-CRS: Multi-Metric Evaluation Based on Correlation Re-Scaling for Evaluating Open-Domain Dialogue

134

19 Jun 2022

Relevance in Dialogue: Is Less More? An Empirical Comparison of Existing Metrics, and a Novel Simple Metric

Ian Berlot-Attwell

Frank Rudzicz

152

03 Jun 2022

InstructDial: Improving Zero and Few-shot Generalization in Dialogue through Instruction TuningConference on Empirical Methods in Natural Language Processing (EMNLP), 2022

351

25 May 2022

Report from the NSF Future Directions Workshop on Automatic Evaluation of Dialog: Research Directions and Challenges

...

Chen Zhang

229

18 Mar 2022

Achieving Reliable Human Assessment of Open-Domain Dialogue SystemsAnnual Meeting of the Association for Computational Linguistics (ACL), 2022

Qun Liu

183

11 Mar 2022

Recent Advances in Neural Text Generation: A Task-Agnostic Survey

359

06 Mar 2022

Probing the Robustness of Trained Metrics for Conversational Dialogue SystemsAnnual Meeting of the Association for Computational Linguistics (ACL), 2022

144

28 Feb 2022

FlowEval: A Consensus-Based Dialogue Evaluation Framework Using Segment Act FlowsConference on Empirical Methods in Natural Language Processing (EMNLP), 2022

212

14 Feb 2022

What are the best systems? New perspectives on NLP Benchmarking

468

08 Feb 2022

LaMDA: Language Models for Dialog Applications

...

382

1,791

20 Jan 2022

Mental Health Assessment for the Chatbots

Jie Zhou

153

14 Jan 2022

Measuring Attribution in Natural Language Generation ModelsComputational Linguistics (CL), 2021

Hannah Rashkin

291

218

23 Dec 2021

Ditch the Gold Standard: Re-evaluating Conversational Question Answering

201

16 Dec 2021

MDD-Eval: Self-Training on Augmented Data for Multi-Domain Dialogue Evaluation

Chen Zhang

L. F. D’Haro

Thomas Friedrichs

Haizhou Li

ELM

152

14 Dec 2021

Automatic Evaluation and Moderation of Open-domain Dialogue Systems

Chen Zhang

João Sedoc

L. F. D’Haro

Rafael E. Banchs

Alexander I. Rudnicky

310

03 Nov 2021

Modeling Performance in Open-Domain Dialogue with PARADISE

159

21 Oct 2021

Better than Average: Paired Evaluation of NLP SystemsAnnual Meeting of the Association for Computational Linguistics (ACL), 2021

246

20 Oct 2021

Investigating the Impact of Pre-trained Language Models on Dialog Evaluation

Chen Zhang

L. F. D’Haro

Yiming Chen

Thomas Friedrichs

Haizhou Li

171

05 Oct 2021

Identifying Untrustworthy Samples: Data Filtering for Open-domain Dialogues with Bayesian Optimization

198

14 Sep 2021

Commonsense-Focused Dialogues for Response Generation: An Empirical Study

Pei Zhou

Karthik Gopalakrishnan

Behnam Hedayatnia

Seokhwan Kim

Jay Pujara

Xiang Ren

Yang Liu

Dilek Z. Hakkani-Tür

LRM

138

14 Sep 2021

Compression, Transduction, and Creation: A Unified Framework for Evaluating Natural Language Generation

200

14 Sep 2021

POSSCORE: A Simple Yet Effective Evaluation of Conversational Search with Part of Speech LabellingInternational Conference on Information and Knowledge Management (CIKM), 2021

Zeyang Liu

K. Zhou

Jiaxin Mao

Max L. Wilson

155

07 Sep 2021

Language Model Augmented Relevance Score

Ruibo Liu

Jason W. Wei

Soroush Vosoughi

139

19 Aug 2021

How to Evaluate Your Dialogue Models: A Review of Approaches

158

03 Aug 2021

Anticipating Safety Issues in E2E Conversational AI: Framework and Tooling

333

115

07 Jul 2021

Synthesizing Adversarial Negative Responses for Robust Response Ranking and EvaluationFindings (Findings), 2021

Prakhar Gupta

Yulia Tsvetkov

Jeffrey P. Bigham

196

10 Jun 2021

Shades of BLEU, Flavours of Success: The Case of MultiWOZIEEE Games Entertainment Media Conference (IEEE GEM), 2021

Tomás Nekvinda

Ondrej Dusek

181

10 Jun 2021

A Comprehensive Assessment of Dialog Evaluation Metrics

Yi-Ting Yeh

M. Eskénazi

Shikib Mehri

291

116

07 Jun 2021

Improving Automated Evaluation of Open Domain Dialog via Diverse Reference AugmentationFindings (Findings), 2021

Varun Gangal

Harsh Jhamtani

Eduard H. Hovy

Taylor Berg-Kirkpatrick

159

05 Jun 2021

DynaEval: Unifying Turn and Dialogue Level EvaluationAnnual Meeting of the Association for Computational Linguistics (ACL), 2021

Chen Zhang

Yiming Chen

Haizhou Li

182

02 Jun 2021

$REAM$\sharp$: An Enhancement Approach to Reference-based Evaluation Metrics for Open-domain Dialog Generation$

REAM

\sharp

: An Enhancement Approach to Reference-based Evaluation Metrics for Open-domain Dialog GenerationFindings (Findings), 2021

Jun Gao

Wei Bi

Ruifeng Xu

Shuming Shi

224

30 May 2021

Towards Standard Criteria for human evaluation of Chatbots: A Survey

Hongru Liang

Huaqing Li

162

24 May 2021

Recent Advances in Deep Learning Based Dialogue Systems: A Systematic SurveyArtificial Intelligence Review (AIR), 2021

831

322

10 May 2021

Assessing Dialogue Systems with Distribution DistancesFindings (Findings), 2021

Defu Lian

248

06 May 2021

Meta-evaluation of Conversational Search Evaluation Metrics

Zeyang Liu

K. Zhou

Max L. Wilson

ELM

157

27 Apr 2021

CLIPScore: A Reference-free Evaluation Metric for Image CaptioningConference on Empirical Methods in Natural Language Processing (EMNLP), 2021

Yejin Choi

958

2,280

18 Apr 2021

$$Q^{2}$: Evaluating Factual Consistency in Knowledge-Grounded Dialogues via Question Generation and Question Answering$

Q^{2}

: Evaluating Factual Consistency in Knowledge-Grounded Dialogues via Question Generation and Question AnsweringConference on Empirical Methods in Natural Language Processing (EMNLP), 2021

409

153

16 Apr 2021

Retrieval Augmentation Reduces Hallucination in ConversationConference on Empirical Methods in Natural Language Processing (EMNLP), 2021

Kurt Shuster

Spencer Poff

Moya Chen

Douwe Kiela

Jason Weston

HILM

326

947

15 Apr 2021

Estimating Subjective Crowd-Evaluations as an Additional Objective to Improve Natural Language Generation

Jakob Nyberg

R. Manuvinakurike

Maike Paetzel-Prüsmann

ALM

132

12 Apr 2021