Learning an Unreferenced Metric for Online Dialogue Evaluation

Annual Meeting of the Association for Computational Linguistics (ACL), 2020

1 May 2020

Koustuv Sinha

Prasanna Parthasarathi

Papers citing "Learning an Unreferenced Metric for Online Dialogue Evaluation"

50 / 57 papers shown

Evaluating LLM-Generated Versus Human-Authored Responses in Role-Play Dialogues

Dongxu Lu

Johan Jeuring

Albert Gatt

274

22 Sep 2025

BoK: Introducing Bag-of-Keywords Loss for Interpretable Dialogue Response GenerationSIGDIAL Conferences (SIGDIAL), 2025

Suvodip Dey

M. Desarkar

OffRL

345

20 Jan 2025

Interaction Matters: An Evaluation Framework for Interactive Dialogue Assessment on English Second Language Conversations

Rena Gao

Carsten Roever

Jey Han Lau

252

09 Jul 2024

Favi-Score: A Measure for Favoritism in Automated Preference Ratings for Generative AI Evaluation

269

03 Jun 2024

Length-Controlled AlpacaEval: A Simple Way to Debias Automatic Evaluators

543

727

06 Apr 2024

Only Send What You Need: Learning to Communicate Efficiently in Federated Multilingual Machine TranslationIEEE Transactions on Audio, Speech, and Language Processing (IEEE TASLP), 2024

Yun-Wei Chu

Dong-Jun Han

Christopher G. Brinton

383

15 Jan 2024

A Survey of Personality, Persona, and Profile in Conversational Agents and Chatbots

Richard Sutcliffe

488

31 Dec 2023

CoAScore: Chain-of-Aspects Prompting for NLG Evaluation

Peiyuan Gong

Jiaxin Mao

ELM

365

16 Dec 2023

Dialogue Quality and Emotion Annotations for Customer Support ConversationsIEEE Games Entertainment Media Conference (IEEE GEM), 2023

204

23 Nov 2023

Automatic Evaluation of Generative Models with Instruction TuningIEEE Games Entertainment Media Conference (IEEE GEM), 2023

Shuhaib Mehri

Vered Shwartz

ELM ALM

185

30 Oct 2023

DiQAD: A Benchmark Dataset for End-to-End Open-domain Dialogue Assessment

179

25 Oct 2023

xDial-Eval: A Multilingual Open-Domain Dialogue Evaluation BenchmarkConference on Empirical Methods in Natural Language Processing (EMNLP), 2023

Chen Zhang

Haizhou Li

247

13 Oct 2023

RADE: Reference-Assisted Dialogue Evaluation for Open-Domain DialogueAnnual Meeting of the Association for Computational Linguistics (ACL), 2023

341

15 Sep 2023

Simple LLM Prompting is State-of-the-Art for Robust and Multilingual Dialogue Evaluation

298

31 Aug 2023

Towards Multilingual Automatic Dialogue EvaluationSIGDIAL Conferences (SIGDIAL), 2023

John Mendonça

A. Lavie

Isabel Trancoso

206

31 Aug 2023

Three Ways of Using Large Language Models to Evaluate Chat

234

12 Aug 2023

Correction of Errors in Preference Ratings from Automated Metrics for Text GenerationAnnual Meeting of the Association for Computational Linguistics (ACL), 2023

253

06 Jun 2023

Evaluating Open-Domain Dialogues in Latent Space with Next Sentence Prediction and Mutual InformationAnnual Meeting of the Association for Computational Linguistics (ACL), 2023

298

26 May 2023

What Comes Next? Evaluating Uncertainty in Neural Text Generators Against Human Production VariabilityConference on Empirical Methods in Natural Language Processing (EMNLP), 2023

536

19 May 2023

DEnsity: Open-domain Dialogue Evaluation Metric using Density EstimationAnnual Meeting of the Association for Computational Linguistics (ACL), 2023

272

08 May 2023

Improving Open-Domain Dialogue Evaluation with a Causal Inference Model

Yang Liu

218

31 Jan 2023

PoE: a Panel of Experts for Generalized Automatic Dialogue AssessmentIEEE/ACM Transactions on Audio Speech and Language Processing (TASLP), 2022

Chen Zhang

L. F. D’Haro

Qiquan Zhang

Thomas Friedrichs

Haizhou Li

212

18 Dec 2022

FineD-Eval: Fine-grained Automatic Dialogue-Level EvaluationConference on Empirical Methods in Natural Language Processing (EMNLP), 2022

Chen Zhang

L. F. D’Haro

Qiquan Zhang

Thomas Friedrichs

Haizhou Li

242

25 Oct 2022

Measuring and Improving Semantic Diversity of Dialogue GenerationConference on Empirical Methods in Natural Language Processing (EMNLP), 2022

Seungju Han

Beomsu Kim

Buru Chang

220

11 Oct 2022

An Equal-Size Hard EM Algorithm for Diverse Dialogue GenerationInternational Conference on Learning Representations (ICLR), 2022

351

29 Sep 2022

Open-Domain Dialog Evaluation using Follow-Ups LikelihoodInternational Conference on Computational Linguistics (COLING), 2022

242

12 Sep 2022

Evaluation of Question Answering Systems: Complexity of judging a natural languageACM Computing Surveys (ACM CSUR), 2022

319

10 Sep 2022

Interactive Evaluation of Dialog Track at DSTC9International Conference on Language Resources and Evaluation (LREC), 2022

366

28 Jul 2022

MME-CRS: Multi-Metric Evaluation Based on Correlation Re-Scaling for Evaluating Open-Domain Dialogue

169

19 Jun 2022

Why is constrained neural language generation particularly challenging?

Cristina Garbacea

Qiaozhu Mei

486

11 Jun 2022

InstructDial: Improving Zero and Few-shot Generalization in Dialogue through Instruction TuningConference on Empirical Methods in Natural Language Processing (EMNLP), 2022

411

25 May 2022

What should I Ask: A Knowledge-driven Approach for Follow-up Questions Generation in Conversational SurveysPacific Asia Conference on Language, Information and Computation (PACLIC), 2022

Heng Ji

393

23 May 2022

CORAL: Contextual Response Retrievability Loss Function for Training Dialog Generation Models

320

21 May 2022

Meet Your Favorite Character: Open-domain Chatbot Mimicking Fictional Characters with only a Few UtterancesNorth American Chapter of the Association for Computational Linguistics (NAACL), 2022

278

22 Apr 2022

Persona-Guided Planning for Controlling the Protagonist's Persona in Story GenerationNorth American Chapter of the Association for Computational Linguistics (NAACL), 2022

199

22 Apr 2022

Spurious Correlations in Reference-Free Evaluation of Text GenerationAnnual Meeting of the Association for Computational Linguistics (ACL), 2022

Esin Durmus

Faisal Ladhak

Tatsunori Hashimoto

184

21 Apr 2022

What is wrong with you?: Leveraging User Sentiment for Automatic Dialog EvaluationFindings (Findings), 2022

Sarik Ghazarian

Behnam Hedayatnia

Alexandros Papangelis

Yang Liu

Dilek Z. Hakkani-Tür

259

25 Mar 2022

Report from the NSF Future Directions Workshop on Automatic Evaluation of Dialog: Research Directions and Challenges

...

Chen Zhang

273

18 Mar 2022

Probing the Robustness of Trained Metrics for Conversational Dialogue SystemsAnnual Meeting of the Association for Computational Linguistics (ACL), 2022

168

28 Feb 2022

MDD-Eval: Self-Training on Augmented Data for Multi-Domain Dialogue Evaluation

Chen Zhang

L. F. D’Haro

Thomas Friedrichs

Haizhou Li

ELM

283

14 Dec 2021

Identifying Untrustworthy Samples: Data Filtering for Open-domain Dialogues with Bayesian Optimization

305

14 Sep 2021

Perturbation CheckLists for Evaluating NLG Evaluation Metrics

Mitesh M. Khapra

403

13 Sep 2021

POSSCORE: A Simple Yet Effective Evaluation of Conversational Search with Part of Speech LabellingInternational Conference on Information and Knowledge Management (CIKM), 2021

Zeyang Liu

K. Zhou

Jiaxin Mao

Max L. Wilson

199

07 Sep 2021

Distilling the Knowledge of Large-scale Generative Models into Retrieval Models for Efficient Open-domain ConversationConference on Empirical Methods in Natural Language Processing (EMNLP), 2021

257

28 Aug 2021

Do Encoder Representations of Generative Dialogue Models Encode Sufficient Information about the Task ?SIGDIAL Conferences (SIGDIAL), 2021

Prasanna Parthasarathi

J. Pineau

Sarath Chandar

263

20 Jun 2021

A Brief Study on the Effects of Training Generative Dialogue Models with a Semantic lossSIGDIAL Conferences (SIGDIAL), 2021

Prasanna Parthasarathi

Mohamed Abdelsalam

J. Pineau

Sarath Chandar

145

20 Jun 2021

A Comprehensive Assessment of Dialog Evaluation Metrics

Yi-Ting Yeh

M. Eskénazi

Shikib Mehri

358

122

07 Jun 2021

DynaEval: Unifying Turn and Dialogue Level EvaluationAnnual Meeting of the Association for Computational Linguistics (ACL), 2021

Chen Zhang

Yiming Chen

Haizhou Li

227

02 Jun 2021

Towards Standard Criteria for human evaluation of Chatbots: A Survey

Hongru Liang

Huaqing Li

191

24 May 2021

Recent Advances in Deep Learning Based Dialogue Systems: A Systematic SurveyArtificial Intelligence Review (AIR), 2021

1.0K

336

10 May 2021