Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales

Terms and Conditions

Twitter GitHub LinkedIn Bluesky Youtube

© 2026 ResearchTrend.AI, All rights reserved.

Home
Papers
2005.00456
Cited By

USR: An Unsupervised and Reference Free Evaluation Metric for Dialog
Generation

USR: An Unsupervised and Reference Free Evaluation Metric for Dialog Generation

Annual Meeting of the Association for Computational Linguistics (ACL), 2020

1 May 2020

ArXiv (abs)PDF HTML

Papers citing "USR: An Unsupervised and Reference Free Evaluation Metric for Dialog Generation"

50 / 161 papers shown

MP2D: An Automated Topic Shift Dialogue Generation Framework Leveraging
Knowledge Graphs

MP2D: An Automated Topic Shift Dialogue Generation Framework Leveraging Knowledge GraphsConference on Empirical Methods in Natural Language Processing (EMNLP), 2024

272

5

0

09 Mar 2024

HD-Eval: Aligning Large Language Model Evaluators Through Hierarchical
Criteria Decomposition

HD-Eval: Aligning Large Language Model Evaluators Through Hierarchical Criteria Decomposition

204

23

0

24 Feb 2024

Can Large Language Models be Good Emotional Supporter? Mitigating Preference Bias on Emotional Support Conversation

Can Large Language Models be Good Emotional Supporter? Mitigating Preference Bias on Emotional Support Conversation

436

47

0

20 Feb 2024

Are LLM-based Evaluators Confusing NLG Quality Criteria?

Are LLM-based Evaluators Confusing NLG Quality Criteria?

Xiaojun Wan

362

37

0

19 Feb 2024

Leveraging Large Language Models for NLG Evaluation: Advances and
Challenges

Leveraging Large Language Models for NLG Evaluation: Advances and ChallengesConference on Empirical Methods in Natural Language Processing (EMNLP), 2024

381

37

0

13 Jan 2024

Rethinking Response Evaluation from Interlocutor's Eye for Open-Domain
Dialogue Systems

Rethinking Response Evaluation from Interlocutor's Eye for Open-Domain Dialogue SystemsInternational Joint Conference on Natural Language Processing (IJCNLP), 2024

Naoki Yoshinaga

149

2

0

04 Jan 2024

BatchEval: Towards Human-like Text Evaluation

BatchEval: Towards Human-like Text EvaluationAnnual Meeting of the Association for Computational Linguistics (ACL), 2023

Shaoxiong Feng

243

17

0

31 Dec 2023

A Comprehensive Analysis of the Effectiveness of Large Language Models
as Automatic Dialogue Evaluators

A Comprehensive Analysis of the Effectiveness of Large Language Models as Automatic Dialogue Evaluators

Chen Zhang

Yiming Chen

Haizhou Li

233

49

0

24 Dec 2023

CoAScore: Chain-of-Aspects Prompting for NLG Evaluation

CoAScore: Chain-of-Aspects Prompting for NLG Evaluation

258

16

0

16 Dec 2023

CESAR: Automatic Induction of Compositional Instructions for Multi-turn
Dialogs

CESAR: Automatic Induction of Compositional Instructions for Multi-turn DialogsConference on Empirical Methods in Natural Language Processing (EMNLP), 2023

Taha İbrahim Aksu

Devamanyu Hazarika

Dilek Z. Hakkani-Tür

Yang Liu

Mahdi Namazifar

231

2

0

29 Nov 2023

Fusion-Eval: Integrating Assistant Evaluators with LLMs

Fusion-Eval: Integrating Assistant Evaluators with LLMsConference on Empirical Methods in Natural Language Processing (EMNLP), 2023

263

6

0

15 Nov 2023

X-Eval: Generalizable Multi-aspect Text Evaluation via Augmented
Instruction Tuning with Auxiliary Evaluation Aspects

X-Eval: Generalizable Multi-aspect Text Evaluation via Augmented Instruction Tuning with Auxiliary Evaluation AspectsNorth American Chapter of the Association for Computational Linguistics (NAACL), 2023

424

33

0

15 Nov 2023

Dialogizer: Context-aware Conversational-QA Dataset Generation from
Textual Sources

Dialogizer: Context-aware Conversational-QA Dataset Generation from Textual Sources

240

9

0

09 Nov 2023

DialogBench: Evaluating LLMs as Human-like Dialogue Systems

DialogBench: Evaluating LLMs as Human-like Dialogue SystemsNorth American Chapter of the Association for Computational Linguistics (NAACL), 2023

Fuzheng Zhang

263

29

0

03 Nov 2023

DiQAD: A Benchmark Dataset for End-to-End Open-domain Dialogue
Assessment

DiQAD: A Benchmark Dataset for End-to-End Open-domain Dialogue Assessment

Shuaiqiang Wang

143

0

0

25 Oct 2023

xDial-Eval: A Multilingual Open-Domain Dialogue Evaluation Benchmark

xDial-Eval: A Multilingual Open-Domain Dialogue Evaluation BenchmarkConference on Empirical Methods in Natural Language Processing (EMNLP), 2023

Chen Zhang

Chengguang Tang

Haizhou Li

200

15

0

13 Oct 2023

A Closer Look into Automatic Evaluation Using Large Language Models

A Closer Look into Automatic Evaluation Using Large Language Models

Cheng-Han Chiang

134

18

0

09 Oct 2023

Calibrating LLM-Based Evaluator

Calibrating LLM-Based EvaluatorInternational Conference on Language Resources and Evaluation (LREC), 2023

331

44

0

23 Sep 2023

RADE: Reference-Assisted Dialogue Evaluation for Open-Domain Dialogue

RADE: Reference-Assisted Dialogue Evaluation for Open-Domain DialogueAnnual Meeting of the Association for Computational Linguistics (ACL), 2023

Sudipta Singha Roy

225

10

0

15 Sep 2023

Simple LLM Prompting is State-of-the-Art for Robust and Multilingual
Dialogue Evaluation

Simple LLM Prompting is State-of-the-Art for Robust and Multilingual Dialogue Evaluation

Patrícia Pereira

Joao Paulo Carvalho

Isabel Trancoso

203

23

0

31 Aug 2023

Towards Multilingual Automatic Dialogue Evaluation

Towards Multilingual Automatic Dialogue EvaluationSIGDIAL Conferences (SIGDIAL), 2023

Isabel Trancoso

160

0

0

31 Aug 2023

GPTEval: A Survey on Assessments of ChatGPT and GPT-4

GPTEval: A Survey on Assessments of ChatGPT and GPT-4International Conference on Language Resources and Evaluation (LREC), 2023

185

146

0

24 Aug 2023

ChatEval: Towards Better LLM-based Evaluators through Multi-Agent Debate

ChatEval: Towards Better LLM-based Evaluators through Multi-Agent Debate

Chi-Min Chan

Jianxuan Yu

Wei Xue

Zhiyuan Liu

261

725

0

14 Aug 2023

Three Ways of Using Large Language Models to Evaluate Chat

Three Ways of Using Large Language Models to Evaluate Chat

Ondvrej Plátek

Vojtvech Hudevcek

Patrícia Schmidtová

166

7

0

12 Aug 2023

Athena 2.0: Discourse and User Modeling in Open Domain Dialogue

Athena 2.0: Discourse and User Modeling in Open Domain Dialogue

Kevin K. Bowden

...

Jeshwanth Bheemanpally

121

7

0

03 Aug 2023

LLM Comparative Assessment: Zero-shot NLG Evaluation through Pairwise
Comparisons using Large Language Models

LLM Comparative Assessment: Zero-shot NLG Evaluation through Pairwise Comparisons using Large Language ModelsConference of the European Chapter of the Association for Computational Linguistics (EACL), 2023

Potsawee Manakul

286

66

0

15 Jul 2023

DecompEval: Evaluating Generated Texts as Unsupervised Decomposed
Question Answering

DecompEval: Evaluating Generated Texts as Unsupervised Decomposed Question AnsweringAnnual Meeting of the Association for Computational Linguistics (ACL), 2023

Fei Huang

Qun Liu

211

11

0

13 Jul 2023

C-PMI: Conditional Pointwise Mutual Information for Turn-level Dialogue
Evaluation

C-PMI: Conditional Pointwise Mutual Information for Turn-level Dialogue EvaluationWorkshop on Document-grounded Dialogue and Conversational Question Answering (DialDoc), 2023

Mankeerat Sidhu

Heng Ji

Chengxiang Zhai

156

7

0

27 Jun 2023

Overview of Robust and Multilingual Automatic Evaluation Metrics for
Open-Domain Dialogue Systems at DSTC 11 Track 4

Overview of Robust and Multilingual Automatic Evaluation Metrics for Open-Domain Dialogue Systems at DSTC 11 Track 4

Mario Rodríguez-Cantelar

Chen Zhang

Chengguang Tang

Sarik Ghazarian

Alexander I. Rudnicky

224

15

0

22 Jun 2023

MISMATCH: Fine-grained Evaluation of Machine-generated Text with
Mismatch Error Types

MISMATCH: Fine-grained Evaluation of Machine-generated Text with Mismatch Error TypesAnnual Meeting of the Association for Computational Linguistics (ACL), 2023

Sarathkrishna Swaminathan

Subhajit Chaudhury

Chulaka Gunasekara

...

Ibrahim Abdelaziz

Pavan Kapanipathi

Alexander G. Gray

220

6

0

18 Jun 2023

Correction of Errors in Preference Ratings from Automated Metrics for
Text Generation

Correction of Errors in Preference Ratings from Automated Metrics for Text GenerationAnnual Meeting of the Association for Computational Linguistics (ACL), 2023

Pius von Daniken

191

2

0

06 Jun 2023

Not All Metrics Are Guilty: Improving NLG Evaluation by Diversifying
References

Not All Metrics Are Guilty: Improving NLG Evaluation by Diversifying ReferencesNorth American Chapter of the Association for Computational Linguistics (NAACL), 2023

Yuchen Eleanor Jiang

154

7

0

24 May 2023

Psychological Metrics for Dialog System Evaluation

Psychological Metrics for Dialog System Evaluation

Salvatore Giorgi

Shreya Havaldar

Farhan S. Ahmed

Pallavi V. Kulkarni

H. Andrew Schwartz

Joao Sedoc

372

6

0

24 May 2023

Evaluate What You Can't Evaluate: Unassessable Quality for Generated
Response

Evaluate What You Can't Evaluate: Unassessable Quality for Generated Response

Shi Feng

Hinrich Schütze

197

2

0

24 May 2023

Asking Clarification Questions to Handle Ambiguity in Open-Domain QA

Asking Clarification Questions to Handle Ambiguity in Open-Domain QAConference on Empirical Methods in Natural Language Processing (EMNLP), 2023

294

25

0

23 May 2023

LLM-Eval: Unified Multi-Dimensional Automatic Evaluation for Open-Domain
Conversations with Large Language Models

LLM-Eval: Unified Multi-Dimensional Automatic Evaluation for Open-Domain Conversations with Large Language Models

Yen-Ting Lin

188

113

0

23 May 2023

Towards More Robust NLP System Evaluation: Handling Missing Scores in
Benchmarks

Towards More Robust NLP System Evaluation: Handling Missing Scores in BenchmarksConference on Empirical Methods in Natural Language Processing (EMNLP), 2023

Ekhine Irurozki

334

11

0

17 May 2023

NLG Evaluation Metrics Beyond Correlation Analysis: An Empirical Metric
Preference Checklist

NLG Evaluation Metrics Beyond Correlation Analysis: An Empirical Metric Preference ChecklistAnnual Meeting of the Association for Computational Linguistics (ACL), 2023

Iftitahu Ni'mah

Vlado Menkovski

Mykola Pechenizkiy

259

20

0

15 May 2023

DEnsity: Open-domain Dialogue Evaluation Metric using Density Estimation

DEnsity: Open-domain Dialogue Evaluation Metric using Density EstimationAnnual Meeting of the Association for Computational Linguistics (ACL), 2023

220

6

0

08 May 2023

Exploring the Use of Large Language Models for Reference-Free Text
Quality Evaluation: An Empirical Study

Exploring the Use of Large Language Models for Reference-Free Text Quality Evaluation: An Empirical StudyInternational Joint Conference on Natural Language Processing (IJCNLP), 2023

Ruifeng Xu

397

114

0

03 Apr 2023

G-Eval: NLG Evaluation using GPT-4 with Better Human Alignment

G-Eval: NLG Evaluation using GPT-4 with Better Human AlignmentConference on Empirical Methods in Natural Language Processing (EMNLP), 2023

Yang Liu

Shuohang Wang

569

1,774

0

29 Mar 2023

KPEval: Towards Fine-Grained Semantic-Based Keyphrase Evaluation

KPEval: Towards Fine-Grained Semantic-Based Keyphrase EvaluationAnnual Meeting of the Association for Computational Linguistics (ACL), 2023

301

5

0

27 Mar 2023

A Transformer-based Response Evaluator for Open-Domain Spoken
Conversation

A Transformer-based Response Evaluator for Open-Domain Spoken Conversation

Vrindavan Harrison

Rishi Rajasekaran

180

6

0

09 Feb 2023

GPTScore: Evaluate as You Desire

GPTScore: Evaluate as You DesireNorth American Chapter of the Association for Computational Linguistics (NAACL), 2023

See-Kiong Ng

388

396

0

08 Feb 2023

Understanding the Effectiveness of Very Large Language Models on Dialog
Evaluation

Understanding the Effectiveness of Very Large Language Models on Dialog Evaluation

Vishrav Chaudhary

217

18

0

27 Jan 2023

Opportunities and Challenges in Neural Dialog Tutoring

Opportunities and Challenges in Neural Dialog TutoringConference of the European Chapter of the Association for Computational Linguistics (EACL), 2023

Mrinmaya Sachan

306

36

0

24 Jan 2023

On the Blind Spots of Model-Based Evaluation Metrics for Text Generation

On the Blind Spots of Model-Based Evaluation Metrics for Text GenerationAnnual Meeting of the Association for Computational Linguistics (ACL), 2022

Tianxing He

Jingyu Zhang

Tianle Wang

383

59

0

20 Dec 2022

Don't Forget Your ABC's: Evaluating the State-of-the-Art in
Chat-Oriented Dialogue Systems

Don't Forget Your ABC's: Evaluating the State-of-the-Art in Chat-Oriented Dialogue SystemsAnnual Meeting of the Association for Computational Linguistics (ACL), 2022

264

15

0

18 Dec 2022

PoE: a Panel of Experts for Generalized Automatic Dialogue Assessment

PoE: a Panel of Experts for Generalized Automatic Dialogue AssessmentIEEE/ACM Transactions on Audio Speech and Language Processing (TASLP), 2022

Chen Zhang

Thomas Friedrichs

Haizhou Li

161

8

0

18 Dec 2022

Bipartite-play Dialogue Collection for Practical Automatic Evaluation of
Dialogue Systems

Bipartite-play Dialogue Collection for Practical Automatic Evaluation of Dialogue Systems

Yosuke Kishinami

Hiroaki Sugiyama

266

2

0

19 Nov 2022