ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2005.00456
  4. Cited By
USR: An Unsupervised and Reference Free Evaluation Metric for Dialog
  Generation

USR: An Unsupervised and Reference Free Evaluation Metric for Dialog Generation

Annual Meeting of the Association for Computational Linguistics (ACL), 2020
1 May 2020
Shikib Mehri
M. Eskénazi
ArXiv (abs)PDFHTML

Papers citing "USR: An Unsupervised and Reference Free Evaluation Metric for Dialog Generation"

50 / 161 papers shown
FineD-Eval: Fine-grained Automatic Dialogue-Level Evaluation
FineD-Eval: Fine-grained Automatic Dialogue-Level EvaluationConference on Empirical Methods in Natural Language Processing (EMNLP), 2022
Chen Zhang
L. F. D’Haro
Qiquan Zhang
Thomas Friedrichs
Haizhou Li
185
22
0
25 Oct 2022
On the Effectiveness of Automated Metrics for Text Generation Systems
On the Effectiveness of Automated Metrics for Text Generation SystemsConference on Empirical Methods in Natural Language Processing (EMNLP), 2022
Pius von Daniken
Jan Deriu
Don Tuggener
Mark Cieliebak
235
3
0
24 Oct 2022
On the Limitations of Reference-Free Evaluations of Generated Text
On the Limitations of Reference-Free Evaluations of Generated TextConference on Empirical Methods in Natural Language Processing (EMNLP), 2022
Daniel Deutsch
Rotem Dror
Dan Roth
284
53
0
22 Oct 2022
Deepfake Text Detection: Limitations and Opportunities
Deepfake Text Detection: Limitations and OpportunitiesIEEE Symposium on Security and Privacy (IEEE S&P), 2022
Jiameng Pu
Zain Sarwar
Sifat Muhammad Abdullah
A. Rehman
Yoonjin Kim
P. Bhattacharya
M. Javed
Bimal Viswanath
AAML
196
70
0
17 Oct 2022
Towards a Unified Multi-Dimensional Evaluator for Text Generation
Towards a Unified Multi-Dimensional Evaluator for Text GenerationConference on Empirical Methods in Natural Language Processing (EMNLP), 2022
Ming Zhong
Yang Liu
Da Yin
Yuning Mao
Yizhu Jiao
Peng Liu
Chenguang Zhu
Heng Ji
Jiawei Han
ELM
250
329
0
13 Oct 2022
Evaluating Agent Interactions Through Episodic Knowledge Graphs
Evaluating Agent Interactions Through Episodic Knowledge Graphs
Selene Báez Santamaría
Piek Vossen
T. Baier
218
3
0
22 Sep 2022
Open-Domain Dialog Evaluation using Follow-Ups Likelihood
Open-Domain Dialog Evaluation using Follow-Ups LikelihoodInternational Conference on Computational Linguistics (COLING), 2022
Maxime De Bruyn
Ehsan Lotfi
Jeska Buhmann
Walter Daelemans
198
9
0
12 Sep 2022
Dialogue Evaluation with Offline Reinforcement Learning
Dialogue Evaluation with Offline Reinforcement LearningSIGDIAL Conferences (SIGDIAL), 2022
Nurul Lubis
Christian Geishauser
Hsien-Chin Lin
Carel van Niekerk
Michael Heck
Shutong Feng
Milica Gavsić
OffRL
169
4
0
02 Sep 2022
The Glass Ceiling of Automatic Evaluation in Natural Language Generation
The Glass Ceiling of Automatic Evaluation in Natural Language GenerationInternational Joint Conference on Natural Language Processing (IJCNLP), 2022
Pierre Colombo
Maxime Peyrard
Nathan Noiry
Robert West
Pablo Piantanida
359
13
0
31 Aug 2022
SelF-Eval: Self-supervised Fine-grained Dialogue Evaluation
SelF-Eval: Self-supervised Fine-grained Dialogue EvaluationInternational Conference on Computational Linguistics (COLING), 2022
Longxuan Ma
Ziyu Zhuang
Weinan Zhang
Mingda Li
Ting Liu
355
4
0
17 Aug 2022
Interactive Evaluation of Dialog Track at DSTC9
Interactive Evaluation of Dialog Track at DSTC9International Conference on Language Resources and Evaluation (LREC), 2022
Shikib Mehri
Yulan Feng
Carla Gordon
S. Alavi
David Traum
M. Eskénazi
197
15
0
28 Jul 2022
MME-CRS: Multi-Metric Evaluation Based on Correlation Re-Scaling for
  Evaluating Open-Domain Dialogue
MME-CRS: Multi-Metric Evaluation Based on Correlation Re-Scaling for Evaluating Open-Domain Dialogue
Pengfei Zhang
Xiao-fei Hu
Kaidong Yu
Jian Wang
Song-Bo Han
Cao Liu
C. Yuan
134
7
0
19 Jun 2022
Relevance in Dialogue: Is Less More? An Empirical Comparison of Existing
  Metrics, and a Novel Simple Metric
Relevance in Dialogue: Is Less More? An Empirical Comparison of Existing Metrics, and a Novel Simple Metric
Ian Berlot-Attwell
Frank Rudzicz
152
1
0
03 Jun 2022
InstructDial: Improving Zero and Few-shot Generalization in Dialogue
  through Instruction Tuning
InstructDial: Improving Zero and Few-shot Generalization in Dialogue through Instruction TuningConference on Empirical Methods in Natural Language Processing (EMNLP), 2022
Prakhar Gupta
Cathy Jiao
Yi-Ting Yeh
Shikib Mehri
M. Eskénazi
Jeffrey P. Bigham
ALM
351
54
0
25 May 2022
Report from the NSF Future Directions Workshop on Automatic Evaluation
  of Dialog: Research Directions and Challenges
Report from the NSF Future Directions Workshop on Automatic Evaluation of Dialog: Research Directions and Challenges
Shikib Mehri
Jinho Choi
L. F. D’Haro
Jan Deriu
M. Eskénazi
...
David Traum
Yi-Ting Yeh
Zhou Yu
Yizhe Zhang
Chen Zhang
229
22
0
18 Mar 2022
Achieving Reliable Human Assessment of Open-Domain Dialogue Systems
Achieving Reliable Human Assessment of Open-Domain Dialogue SystemsAnnual Meeting of the Association for Computational Linguistics (ACL), 2022
Tianbo Ji
Yvette Graham
Gareth J. F. Jones
Chenyang Lyu
Qun Liu
ALM
183
41
0
11 Mar 2022
Recent Advances in Neural Text Generation: A Task-Agnostic Survey
Recent Advances in Neural Text Generation: A Task-Agnostic Survey
Chen Tang
Frank Guerin
Chenghua Lin
AI4CEOOD
359
20
0
06 Mar 2022
Probing the Robustness of Trained Metrics for Conversational Dialogue
  Systems
Probing the Robustness of Trained Metrics for Conversational Dialogue SystemsAnnual Meeting of the Association for Computational Linguistics (ACL), 2022
Jan Deriu
Don Tuggener
Pius von Daniken
Mark Cieliebak
AAML
144
11
0
28 Feb 2022
FlowEval: A Consensus-Based Dialogue Evaluation Framework Using Segment
  Act Flows
FlowEval: A Consensus-Based Dialogue Evaluation Framework Using Segment Act FlowsConference on Empirical Methods in Natural Language Processing (EMNLP), 2022
Jianqiao Zhao
Yanyang Li
Wanyu Du
Yangfeng Ji
Dong Yu
Michael R. Lyu
Liwei Wang
212
4
0
14 Feb 2022
What are the best systems? New perspectives on NLP Benchmarking
What are the best systems? New perspectives on NLP Benchmarking
Pierre Colombo
Nathan Noiry
Ekhine Irurozki
Nathan Huet
468
43
0
08 Feb 2022
LaMDA: Language Models for Dialog Applications
LaMDA: Language Models for Dialog Applications
R. Thoppilan
Daniel De Freitas
Jamie Hall
Noam M. Shazeer
Apoorv Kulshreshtha
...
Blaise Aguera-Arcas
Claire Cui
M. Croak
Ed H. Chi
Quoc Le
ALM
382
1,791
0
20 Jan 2022
Mental Health Assessment for the Chatbots
Mental Health Assessment for the Chatbots
Yong Shan
Jinchao Zhang
Zekang Li
Yang Feng
Jie Zhou
AI4MH
153
4
0
14 Jan 2022
Measuring Attribution in Natural Language Generation Models
Measuring Attribution in Natural Language Generation ModelsComputational Linguistics (CL), 2021
Hannah Rashkin
Vitaly Nikolaev
Matthew Lamm
Lora Aroyo
Michael Collins
Dipanjan Das
Slav Petrov
Gaurav Singh Tomar
Iulia Turc
David Reitter
291
218
0
23 Dec 2021
Ditch the Gold Standard: Re-evaluating Conversational Question Answering
Ditch the Gold Standard: Re-evaluating Conversational Question Answering
Huihan Li
Tianyu Gao
Manan Goenka
Danqi Chen
201
23
0
16 Dec 2021
MDD-Eval: Self-Training on Augmented Data for Multi-Domain Dialogue
  Evaluation
MDD-Eval: Self-Training on Augmented Data for Multi-Domain Dialogue Evaluation
Chen Zhang
L. F. D’Haro
Thomas Friedrichs
Haizhou Li
ELM
152
22
0
14 Dec 2021
Automatic Evaluation and Moderation of Open-domain Dialogue Systems
Automatic Evaluation and Moderation of Open-domain Dialogue Systems
Chen Zhang
João Sedoc
L. F. D’Haro
Rafael E. Banchs
Alexander I. Rudnicky
310
41
0
03 Nov 2021
Modeling Performance in Open-Domain Dialogue with PARADISE
Modeling Performance in Open-Domain Dialogue with PARADISE
M. Walker
Colin Harmon
James Graupera
Davan Harrison
S. Whittaker
159
8
0
21 Oct 2021
Better than Average: Paired Evaluation of NLP Systems
Better than Average: Paired Evaluation of NLP SystemsAnnual Meeting of the Association for Computational Linguistics (ACL), 2021
Maxime Peyrard
Wei Zhao
Steffen Eger
Robert West
ELM
246
32
0
20 Oct 2021
Investigating the Impact of Pre-trained Language Models on Dialog
  Evaluation
Investigating the Impact of Pre-trained Language Models on Dialog Evaluation
Chen Zhang
L. F. D’Haro
Yiming Chen
Thomas Friedrichs
Haizhou Li
171
5
0
05 Oct 2021
Identifying Untrustworthy Samples: Data Filtering for Open-domain
  Dialogues with Bayesian Optimization
Identifying Untrustworthy Samples: Data Filtering for Open-domain Dialogues with Bayesian Optimization
Lei Shen
Haolan Zhan
Xin Shen
Hongshen Chen
Xiaofang Zhao
Xiao-Dan Zhu
198
19
0
14 Sep 2021
Commonsense-Focused Dialogues for Response Generation: An Empirical
  Study
Commonsense-Focused Dialogues for Response Generation: An Empirical Study
Pei Zhou
Karthik Gopalakrishnan
Behnam Hedayatnia
Seokhwan Kim
Jay Pujara
Xiang Ren
Yang Liu
Dilek Z. Hakkani-Tür
LRM
138
50
0
14 Sep 2021
Compression, Transduction, and Creation: A Unified Framework for
  Evaluating Natural Language Generation
Compression, Transduction, and Creation: A Unified Framework for Evaluating Natural Language Generation
Mingkai Deng
Bowen Tan
Zhengzhong Liu
Eric Xing
Zhiting Hu
200
80
0
14 Sep 2021
POSSCORE: A Simple Yet Effective Evaluation of Conversational Search
  with Part of Speech Labelling
POSSCORE: A Simple Yet Effective Evaluation of Conversational Search with Part of Speech LabellingInternational Conference on Information and Knowledge Management (CIKM), 2021
Zeyang Liu
K. Zhou
Jiaxin Mao
Max L. Wilson
155
3
0
07 Sep 2021
Language Model Augmented Relevance Score
Language Model Augmented Relevance Score
Ruibo Liu
Jason W. Wei
Soroush Vosoughi
139
11
0
19 Aug 2021
How to Evaluate Your Dialogue Models: A Review of Approaches
How to Evaluate Your Dialogue Models: A Review of Approaches
Xinmeng Li
Wansen Wu
Long Qin
Quanjun Yin
ELM
158
10
0
03 Aug 2021
Anticipating Safety Issues in E2E Conversational AI: Framework and
  Tooling
Anticipating Safety Issues in E2E Conversational AI: Framework and Tooling
Emily Dinan
Gavin Abercrombie
A. S. Bergman
Shannon L. Spruit
Dirk Hovy
Y-Lan Boureau
Verena Rieser
333
115
0
07 Jul 2021
Synthesizing Adversarial Negative Responses for Robust Response Ranking
  and Evaluation
Synthesizing Adversarial Negative Responses for Robust Response Ranking and EvaluationFindings (Findings), 2021
Prakhar Gupta
Yulia Tsvetkov
Jeffrey P. Bigham
196
25
0
10 Jun 2021
Shades of BLEU, Flavours of Success: The Case of MultiWOZ
Shades of BLEU, Flavours of Success: The Case of MultiWOZIEEE Games Entertainment Media Conference (IEEE GEM), 2021
Tomás Nekvinda
Ondrej Dusek
181
62
0
10 Jun 2021
A Comprehensive Assessment of Dialog Evaluation Metrics
A Comprehensive Assessment of Dialog Evaluation Metrics
Yi-Ting Yeh
M. Eskénazi
Shikib Mehri
291
116
0
07 Jun 2021
Improving Automated Evaluation of Open Domain Dialog via Diverse
  Reference Augmentation
Improving Automated Evaluation of Open Domain Dialog via Diverse Reference AugmentationFindings (Findings), 2021
Varun Gangal
Harsh Jhamtani
Eduard H. Hovy
Taylor Berg-Kirkpatrick
159
9
0
05 Jun 2021
DynaEval: Unifying Turn and Dialogue Level Evaluation
DynaEval: Unifying Turn and Dialogue Level EvaluationAnnual Meeting of the Association for Computational Linguistics (ACL), 2021
Chen Zhang
Yiming Chen
L. F. D’Haro
Yan Zhang
Thomas Friedrichs
Grandee Lee
Haizhou Li
182
78
0
02 Jun 2021
REAM$\sharp$: An Enhancement Approach to Reference-based Evaluation
  Metrics for Open-domain Dialog Generation
REAM♯\sharp♯: An Enhancement Approach to Reference-based Evaluation Metrics for Open-domain Dialog GenerationFindings (Findings), 2021
Jun Gao
Wei Bi
Ruifeng Xu
Shuming Shi
224
7
0
30 May 2021
Towards Standard Criteria for human evaluation of Chatbots: A Survey
Towards Standard Criteria for human evaluation of Chatbots: A Survey
Hongru Liang
Huaqing Li
162
16
0
24 May 2021
Recent Advances in Deep Learning Based Dialogue Systems: A Systematic
  Survey
Recent Advances in Deep Learning Based Dialogue Systems: A Systematic SurveyArtificial Intelligence Review (AIR), 2021
Jinjie Ni
Tom Young
Vlad Pandelea
Fuzhao Xue
Xiaoshi Zhong
831
322
0
10 May 2021
Assessing Dialogue Systems with Distribution Distances
Assessing Dialogue Systems with Distribution DistancesFindings (Findings), 2021
Jiannan Xiang
Yahui Liu
Deng Cai
Huayang Li
Defu Lian
Lemao Liu
248
22
0
06 May 2021
Meta-evaluation of Conversational Search Evaluation Metrics
Meta-evaluation of Conversational Search Evaluation Metrics
Zeyang Liu
K. Zhou
Max L. Wilson
ELM
157
25
0
27 Apr 2021
CLIPScore: A Reference-free Evaluation Metric for Image Captioning
CLIPScore: A Reference-free Evaluation Metric for Image CaptioningConference on Empirical Methods in Natural Language Processing (EMNLP), 2021
Jack Hessel
Ari Holtzman
Maxwell Forbes
Ronan Le Bras
Yejin Choi
CLIP
958
2,280
0
18 Apr 2021
$Q^{2}$: Evaluating Factual Consistency in Knowledge-Grounded Dialogues
  via Question Generation and Question Answering
Q2Q^{2}Q2: Evaluating Factual Consistency in Knowledge-Grounded Dialogues via Question Generation and Question AnsweringConference on Empirical Methods in Natural Language Processing (EMNLP), 2021
Or Honovich
Leshem Choshen
Roee Aharoni
Ella Neeman
Idan Szpektor
Omri Abend
HILM
409
153
0
16 Apr 2021
Retrieval Augmentation Reduces Hallucination in Conversation
Retrieval Augmentation Reduces Hallucination in ConversationConference on Empirical Methods in Natural Language Processing (EMNLP), 2021
Kurt Shuster
Spencer Poff
Moya Chen
Douwe Kiela
Jason Weston
HILM
326
947
0
15 Apr 2021
Estimating Subjective Crowd-Evaluations as an Additional Objective to
  Improve Natural Language Generation
Estimating Subjective Crowd-Evaluations as an Additional Objective to Improve Natural Language Generation
Jakob Nyberg
R. Manuvinakurike
Maike Paetzel-Prüsmann
ALM
132
1
0
12 Apr 2021
Previous
1234
Next