ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2005.00456
  4. Cited By
USR: An Unsupervised and Reference Free Evaluation Metric for Dialog
  Generation

USR: An Unsupervised and Reference Free Evaluation Metric for Dialog Generation

Annual Meeting of the Association for Computational Linguistics (ACL), 2020
1 May 2020
Shikib Mehri
M. Eskénazi
ArXiv (abs)PDFHTML

Papers citing "USR: An Unsupervised and Reference Free Evaluation Metric for Dialog Generation"

50 / 161 papers shown
MP2D: An Automated Topic Shift Dialogue Generation Framework Leveraging
  Knowledge Graphs
MP2D: An Automated Topic Shift Dialogue Generation Framework Leveraging Knowledge GraphsConference on Empirical Methods in Natural Language Processing (EMNLP), 2024
Yerin Hwang
Yongi-Mi Kim
Yunah Jang
Jeesoo Bang
Hyunkyung Bae
Kyomin Jung
272
5
0
09 Mar 2024
HD-Eval: Aligning Large Language Model Evaluators Through Hierarchical
  Criteria Decomposition
HD-Eval: Aligning Large Language Model Evaluators Through Hierarchical Criteria Decomposition
Yuxuan Liu
Tianchi Yang
Shaohan Huang
Zihan Zhang
Haizhen Huang
Furu Wei
Weiwei Deng
Feng Sun
Qi Zhang
204
23
0
24 Feb 2024
Can Large Language Models be Good Emotional Supporter? Mitigating Preference Bias on Emotional Support Conversation
Can Large Language Models be Good Emotional Supporter? Mitigating Preference Bias on Emotional Support Conversation
Dongjin Kang
Sunghwan Kim
Taeyoon Kwon
Seungjun Moon
Hyunsouk Cho
Youngjae Yu
Dongha Lee
Jinyoung Yeo
436
47
0
20 Feb 2024
Are LLM-based Evaluators Confusing NLG Quality Criteria?
Are LLM-based Evaluators Confusing NLG Quality Criteria?
Xinyu Hu
Mingqi Gao
Sen Hu
Yang Zhang
Yicheng Chen
Teng Xu
Xiaojun Wan
AAMLELM
362
37
0
19 Feb 2024
Leveraging Large Language Models for NLG Evaluation: Advances and
  Challenges
Leveraging Large Language Models for NLG Evaluation: Advances and ChallengesConference on Empirical Methods in Natural Language Processing (EMNLP), 2024
Zhen Li
Xiaohan Xu
Tao Shen
Can Xu
Jia-Chen Gu
Yuxuan Lai
Chongyang Tao
Shuai Ma
LM&MAELM
381
37
0
13 Jan 2024
Rethinking Response Evaluation from Interlocutor's Eye for Open-Domain
  Dialogue Systems
Rethinking Response Evaluation from Interlocutor's Eye for Open-Domain Dialogue SystemsInternational Joint Conference on Natural Language Processing (IJCNLP), 2024
Tsuta Yuma
Naoki Yoshinaga
Shoetsu Sato
Masashi Toyoda
149
2
0
04 Jan 2024
BatchEval: Towards Human-like Text Evaluation
BatchEval: Towards Human-like Text EvaluationAnnual Meeting of the Association for Computational Linguistics (ACL), 2023
Peiwen Yuan
Shaoxiong Feng
Yiwei Li
Xinglin Wang
Boyuan Pan
Heda Wang
Kan Li
ALM
243
17
0
31 Dec 2023
A Comprehensive Analysis of the Effectiveness of Large Language Models
  as Automatic Dialogue Evaluators
A Comprehensive Analysis of the Effectiveness of Large Language Models as Automatic Dialogue Evaluators
Chen Zhang
L. F. D’Haro
Yiming Chen
Malu Zhang
Haizhou Li
ELM
233
49
0
24 Dec 2023
CoAScore: Chain-of-Aspects Prompting for NLG Evaluation
CoAScore: Chain-of-Aspects Prompting for NLG Evaluation
Peiyuan Gong
Jiaxin Mao
ELM
258
16
0
16 Dec 2023
CESAR: Automatic Induction of Compositional Instructions for Multi-turn
  Dialogs
CESAR: Automatic Induction of Compositional Instructions for Multi-turn DialogsConference on Empirical Methods in Natural Language Processing (EMNLP), 2023
Taha İbrahim Aksu
Devamanyu Hazarika
Shikib Mehri
Seokhwan Kim
Dilek Z. Hakkani-Tür
Yang Liu
Mahdi Namazifar
231
2
0
29 Nov 2023
Fusion-Eval: Integrating Assistant Evaluators with LLMs
Fusion-Eval: Integrating Assistant Evaluators with LLMsConference on Empirical Methods in Natural Language Processing (EMNLP), 2023
Lei Shu
Nevan Wichers
Liangchen Luo
Yun Zhu
Yinxiao Liu
Jindong Chen
Lei Meng
ELM
263
6
0
15 Nov 2023
X-Eval: Generalizable Multi-aspect Text Evaluation via Augmented
  Instruction Tuning with Auxiliary Evaluation Aspects
X-Eval: Generalizable Multi-aspect Text Evaluation via Augmented Instruction Tuning with Auxiliary Evaluation AspectsNorth American Chapter of the Association for Computational Linguistics (NAACL), 2023
Minqian Liu
Ying Shen
Zhiyang Xu
Yixin Cao
Eunah Cho
Vaibhav Kumar
Reza Ghanadan
Lifu Huang
ELMLM&MAALM
424
33
0
15 Nov 2023
Dialogizer: Context-aware Conversational-QA Dataset Generation from
  Textual Sources
Dialogizer: Context-aware Conversational-QA Dataset Generation from Textual Sources
Yerin Hwang
Yongi-Mi Kim
Hyunkyung Bae
Jeesoo Bang
Hwanhee Lee
Kyomin Jung
240
9
0
09 Nov 2023
DialogBench: Evaluating LLMs as Human-like Dialogue Systems
DialogBench: Evaluating LLMs as Human-like Dialogue SystemsNorth American Chapter of the Association for Computational Linguistics (NAACL), 2023
Jiao Ou
Junda Lu
Che Liu
Yihong Tang
Fuzheng Zhang
Chen Zhang
Kun Gai
ALMLM&MA
263
29
0
03 Nov 2023
DiQAD: A Benchmark Dataset for End-to-End Open-domain Dialogue
  Assessment
DiQAD: A Benchmark Dataset for End-to-End Open-domain Dialogue Assessment
Yukun Zhao
Lingyong Yan
Weiwei Sun
Chong Meng
Shuaiqiang Wang
Zhicong Cheng
Zhaochun Ren
D. Yin
ELM
143
0
0
25 Oct 2023
xDial-Eval: A Multilingual Open-Domain Dialogue Evaluation Benchmark
xDial-Eval: A Multilingual Open-Domain Dialogue Evaluation BenchmarkConference on Empirical Methods in Natural Language Processing (EMNLP), 2023
Chen Zhang
L. F. D’Haro
Chengguang Tang
Ke Shi
Guohua Tang
Haizhou Li
ELM
200
15
0
13 Oct 2023
A Closer Look into Automatic Evaluation Using Large Language Models
A Closer Look into Automatic Evaluation Using Large Language Models
Cheng-Han Chiang
Hunghuei Lee
ELMALMLM&MA
134
18
0
09 Oct 2023
Calibrating LLM-Based Evaluator
Calibrating LLM-Based EvaluatorInternational Conference on Language Resources and Evaluation (LREC), 2023
Yuxuan Liu
Tianchi Yang
Shaohan Huang
Zihan Zhang
Haizhen Huang
Furu Wei
Weiwei Deng
Feng Sun
Qi Zhang
331
44
0
23 Sep 2023
RADE: Reference-Assisted Dialogue Evaluation for Open-Domain Dialogue
RADE: Reference-Assisted Dialogue Evaluation for Open-Domain DialogueAnnual Meeting of the Association for Computational Linguistics (ACL), 2023
Zhengliang Shi
Weiwei Sun
Shuo Zhang
Zhen Zhang
Sudipta Singha Roy
Zhaochun Ren
225
10
0
15 Sep 2023
Simple LLM Prompting is State-of-the-Art for Robust and Multilingual
  Dialogue Evaluation
Simple LLM Prompting is State-of-the-Art for Robust and Multilingual Dialogue Evaluation
J. Mendoncca
Patrícia Pereira
Helena Moniz
Joao Paulo Carvalho
A. Lavie
Isabel Trancoso
203
23
0
31 Aug 2023
Towards Multilingual Automatic Dialogue Evaluation
Towards Multilingual Automatic Dialogue EvaluationSIGDIAL Conferences (SIGDIAL), 2023
John Mendonça
A. Lavie
Isabel Trancoso
160
0
0
31 Aug 2023
GPTEval: A Survey on Assessments of ChatGPT and GPT-4
GPTEval: A Survey on Assessments of ChatGPT and GPT-4International Conference on Language Resources and Evaluation (LREC), 2023
Rui Mao
Guanyi Chen
Xulang Zhang
Frank Guerin
Xiaoshi Zhong
ELMLM&MA
185
146
0
24 Aug 2023
ChatEval: Towards Better LLM-based Evaluators through Multi-Agent Debate
ChatEval: Towards Better LLM-based Evaluators through Multi-Agent Debate
Chi-Min Chan
Weize Chen
Yusheng Su
Jianxuan Yu
Wei Xue
Shan Zhang
Jie Fu
Zhiyuan Liu
ELMLLMAGALM
261
725
0
14 Aug 2023
Three Ways of Using Large Language Models to Evaluate Chat
Three Ways of Using Large Language Models to Evaluate Chat
Ondvrej Plátek
Vojtvech Hudevcek
Patrícia Schmidtová
Mateusz Lango
Ondrej Dusek
ALM
166
7
0
12 Aug 2023
Athena 2.0: Discourse and User Modeling in Open Domain Dialogue
Athena 2.0: Discourse and User Modeling in Open Domain Dialogue
Omkar Patil
Lena Reed
Kevin K. Bowden
Juraj Juraska
Wen Cui
...
Phillip Lee
Jeshwanth Bheemanpally
Rohan Pandey
A. Ratnaparkhi
M. Walker
LLMAG
121
7
0
03 Aug 2023
LLM Comparative Assessment: Zero-shot NLG Evaluation through Pairwise
  Comparisons using Large Language Models
LLM Comparative Assessment: Zero-shot NLG Evaluation through Pairwise Comparisons using Large Language ModelsConference of the European Chapter of the Association for Computational Linguistics (EACL), 2023
Adian Liusie
Potsawee Manakul
Mark Gales
ELM
286
66
0
15 Jul 2023
DecompEval: Evaluating Generated Texts as Unsupervised Decomposed
  Question Answering
DecompEval: Evaluating Generated Texts as Unsupervised Decomposed Question AnsweringAnnual Meeting of the Association for Computational Linguistics (ACL), 2023
Pei Ke
Fei Huang
Fei Mi
Yasheng Wang
Qun Liu
Xiaoyan Zhu
Shiyu Huang
ReLMELM
211
11
0
13 Jul 2023
C-PMI: Conditional Pointwise Mutual Information for Turn-level Dialogue
  Evaluation
C-PMI: Conditional Pointwise Mutual Information for Turn-level Dialogue EvaluationWorkshop on Document-grounded Dialogue and Conversational Question Answering (DialDoc), 2023
Liliang Ren
Mankeerat Sidhu
Qi Zeng
R. Reddy
Heng Ji
Chengxiang Zhai
156
7
0
27 Jun 2023
Overview of Robust and Multilingual Automatic Evaluation Metrics for
  Open-Domain Dialogue Systems at DSTC 11 Track 4
Overview of Robust and Multilingual Automatic Evaluation Metrics for Open-Domain Dialogue Systems at DSTC 11 Track 4
Mario Rodríguez-Cantelar
Chen Zhang
Chengguang Tang
Ke Shi
Sarik Ghazarian
João Sedoc
L. F. D’Haro
Alexander I. Rudnicky
224
15
0
22 Jun 2023
MISMATCH: Fine-grained Evaluation of Machine-generated Text with
  Mismatch Error Types
MISMATCH: Fine-grained Evaluation of Machine-generated Text with Mismatch Error TypesAnnual Meeting of the Association for Computational Linguistics (ACL), 2023
K. Murugesan
Sarathkrishna Swaminathan
Soham Dan
Subhajit Chaudhury
Chulaka Gunasekara
...
Ibrahim Abdelaziz
Achille Fokoue
Pavan Kapanipathi
Salim Roukos
Alexander G. Gray
220
6
0
18 Jun 2023
Correction of Errors in Preference Ratings from Automated Metrics for
  Text Generation
Correction of Errors in Preference Ratings from Automated Metrics for Text GenerationAnnual Meeting of the Association for Computational Linguistics (ACL), 2023
Jan Deriu
Pius von Daniken
Don Tuggener
Mark Cieliebak
191
2
0
06 Jun 2023
Not All Metrics Are Guilty: Improving NLG Evaluation by Diversifying
  References
Not All Metrics Are Guilty: Improving NLG Evaluation by Diversifying ReferencesNorth American Chapter of the Association for Computational Linguistics (NAACL), 2023
Tianyi Tang
Hongyuan Lu
Yuchen Eleanor Jiang
Haoyang Huang
Dongdong Zhang
Wayne Xin Zhao
Tom Kocmi
Furu Wei
154
7
0
24 May 2023
Psychological Metrics for Dialog System Evaluation
Psychological Metrics for Dialog System Evaluation
Salvatore Giorgi
Shreya Havaldar
Farhan S. Ahmed
Zuhaib Akhtar
Shalaka Vaidya
Gary Pan
Pallavi V. Kulkarni
H. Andrew Schwartz
Joao Sedoc
372
6
0
24 May 2023
Evaluate What You Can't Evaluate: Unassessable Quality for Generated
  Response
Evaluate What You Can't Evaluate: Unassessable Quality for Generated Response
Yongkang Liu
Shi Feng
Daling Wang
Yifei Zhang
Hinrich Schütze
ALMELM
197
2
0
24 May 2023
Asking Clarification Questions to Handle Ambiguity in Open-Domain QA
Asking Clarification Questions to Handle Ambiguity in Open-Domain QAConference on Empirical Methods in Natural Language Processing (EMNLP), 2023
Dongryeol Lee
Segwang Kim
Minwoo Lee
Hwanhee Lee
Joonsuk Park
Sang-Woo Lee
Kyomin Jung
UQLM
294
25
0
23 May 2023
LLM-Eval: Unified Multi-Dimensional Automatic Evaluation for Open-Domain
  Conversations with Large Language Models
LLM-Eval: Unified Multi-Dimensional Automatic Evaluation for Open-Domain Conversations with Large Language Models
Yen-Ting Lin
Yun-Nung Chen
188
113
0
23 May 2023
Towards More Robust NLP System Evaluation: Handling Missing Scores in
  Benchmarks
Towards More Robust NLP System Evaluation: Handling Missing Scores in BenchmarksConference on Empirical Methods in Natural Language Processing (EMNLP), 2023
Anas Himmi
Ekhine Irurozki
Nathan Noiry
Nathan Huet
Pierre Colombo
334
11
0
17 May 2023
NLG Evaluation Metrics Beyond Correlation Analysis: An Empirical Metric
  Preference Checklist
NLG Evaluation Metrics Beyond Correlation Analysis: An Empirical Metric Preference ChecklistAnnual Meeting of the Association for Computational Linguistics (ACL), 2023
Iftitahu Ni'mah
Meng Fang
Vlado Menkovski
Mykola Pechenizkiy
259
20
0
15 May 2023
DEnsity: Open-domain Dialogue Evaluation Metric using Density Estimation
DEnsity: Open-domain Dialogue Evaluation Metric using Density EstimationAnnual Meeting of the Association for Computational Linguistics (ACL), 2023
Yujin Baek
Seungil Lee
Daniel Rim
Jaegul Choo
220
6
0
08 May 2023
Exploring the Use of Large Language Models for Reference-Free Text
  Quality Evaluation: An Empirical Study
Exploring the Use of Large Language Models for Reference-Free Text Quality Evaluation: An Empirical StudyInternational Joint Conference on Natural Language Processing (IJCNLP), 2023
Yi Chen
Rui Wang
Haiyun Jiang
Shuming Shi
Ruifeng Xu
LM&MA
397
114
0
03 Apr 2023
G-Eval: NLG Evaluation using GPT-4 with Better Human Alignment
G-Eval: NLG Evaluation using GPT-4 with Better Human AlignmentConference on Empirical Methods in Natural Language Processing (EMNLP), 2023
Yang Liu
Dan Iter
Yichong Xu
Shuohang Wang
Ruochen Xu
Chenguang Zhu
ELMALMLM&MA
569
1,774
0
29 Mar 2023
KPEval: Towards Fine-Grained Semantic-Based Keyphrase Evaluation
KPEval: Towards Fine-Grained Semantic-Based Keyphrase EvaluationAnnual Meeting of the Association for Computational Linguistics (ACL), 2023
Di Wu
Da Yin
Kai-Wei Chang
301
5
0
27 Mar 2023
A Transformer-based Response Evaluator for Open-Domain Spoken
  Conversation
A Transformer-based Response Evaluator for Open-Domain Spoken Conversation
Vrindavan Harrison
Rishi Rajasekaran
M. Walker
OffRL
180
6
0
09 Feb 2023
GPTScore: Evaluate as You Desire
GPTScore: Evaluate as You DesireNorth American Chapter of the Association for Computational Linguistics (NAACL), 2023
Jinlan Fu
See-Kiong Ng
Zhengbao Jiang
Pengfei Liu
LM&MAALMELM
388
396
0
08 Feb 2023
Understanding the Effectiveness of Very Large Language Models on Dialog
  Evaluation
Understanding the Effectiveness of Very Large Language Models on Dialog Evaluation
Jessica Huynh
Cathy Jiao
Prakhar Gupta
Shikib Mehri
Payal Bajaj
Vishrav Chaudhary
M. Eskénazi
ELMLM&MA
217
18
0
27 Jan 2023
Opportunities and Challenges in Neural Dialog Tutoring
Opportunities and Challenges in Neural Dialog TutoringConference of the European Chapter of the Association for Computational Linguistics (EACL), 2023
Jakub Macina
Nico Daheim
Lingzhi Wang
Tanmay Sinha
Manu Kapur
Iryna Gurevych
Mrinmaya Sachan
306
36
0
24 Jan 2023
On the Blind Spots of Model-Based Evaluation Metrics for Text Generation
On the Blind Spots of Model-Based Evaluation Metrics for Text GenerationAnnual Meeting of the Association for Computational Linguistics (ACL), 2022
Tianxing He
Jingyu Zhang
Tianle Wang
Sachin Kumar
Dong Wang
James R. Glass
Yulia Tsvetkov
383
59
0
20 Dec 2022
Don't Forget Your ABC's: Evaluating the State-of-the-Art in
  Chat-Oriented Dialogue Systems
Don't Forget Your ABC's: Evaluating the State-of-the-Art in Chat-Oriented Dialogue SystemsAnnual Meeting of the Association for Computational Linguistics (ACL), 2022
Sarah E. Finch
James D. Finch
Jinho Choi
264
15
0
18 Dec 2022
PoE: a Panel of Experts for Generalized Automatic Dialogue Assessment
PoE: a Panel of Experts for Generalized Automatic Dialogue AssessmentIEEE/ACM Transactions on Audio Speech and Language Processing (TASLP), 2022
Chen Zhang
L. F. D’Haro
Qiquan Zhang
Thomas Friedrichs
Haizhou Li
161
8
0
18 Dec 2022
Bipartite-play Dialogue Collection for Practical Automatic Evaluation of
  Dialogue Systems
Bipartite-play Dialogue Collection for Practical Automatic Evaluation of Dialogue Systems
Shiki Sato
Yosuke Kishinami
Hiroaki Sugiyama
Reina Akama
Ryoko Tokuhisa
Jun Suzuki
266
2
0
19 Nov 2022
Previous
1234
Next