ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2005.00456
  4. Cited By
USR: An Unsupervised and Reference Free Evaluation Metric for Dialog
  Generation

USR: An Unsupervised and Reference Free Evaluation Metric for Dialog Generation

Annual Meeting of the Association for Computational Linguistics (ACL), 2020
1 May 2020
Shikib Mehri
M. Eskénazi
ArXiv (abs)PDFHTML

Papers citing "USR: An Unsupervised and Reference Free Evaluation Metric for Dialog Generation"

50 / 161 papers shown
Title
Mind the Goal: Data-Efficient Goal-Oriented Evaluation of Conversational Agents and Chatbots using Teacher Models
Mind the Goal: Data-Efficient Goal-Oriented Evaluation of Conversational Agents and Chatbots using Teacher Models
Deepak Babu Piskala
Sharlene Chen
Udita Patel
Parul Kalra
Rafael Castrillo
LLMAG
85
0
0
04 Oct 2025
MDSEval: A Meta-Evaluation Benchmark for Multimodal Dialogue Summarization
MDSEval: A Meta-Evaluation Benchmark for Multimodal Dialogue Summarization
Yinhong Liu
Jianfeng He
Hang Su
Ruixue Lian
Yi Nian
Jake W. Vincent
Srikanth Vishnubhotla
Robinson Piramuthu
Saab Mansour
92
0
0
02 Oct 2025
Evaluating LLM-Generated Versus Human-Authored Responses in Role-Play Dialogues
Evaluating LLM-Generated Versus Human-Authored Responses in Role-Play Dialogues
Dongxu Lu
Johan Jeuring
Albert Gatt
208
0
0
22 Sep 2025
Direct-Scoring NLG Evaluators Can Use Pairwise Comparisons Too
Direct-Scoring NLG Evaluators Can Use Pairwise Comparisons Too
Logan Lawrence
Ashton Williamson
Alexander Shelton
ELM
93
0
0
05 Sep 2025
Neither Valid nor Reliable? Investigating the Use of LLMs as Judges
Neither Valid nor Reliable? Investigating the Use of LLMs as Judges
Khaoula Chehbouni
Mohammed Haddou
Jackie CK Cheung
G. Farnadi
LLMAG
313
6
0
25 Aug 2025
Can LLMs Generate High-Quality Task-Specific Conversations?
Can LLMs Generate High-Quality Task-Specific Conversations?
Shengqi Li
Amarnath Gupta
LM&MA
146
0
0
04 Aug 2025
Goal Alignment in LLM-Based User Simulators for Conversational AI
Goal Alignment in LLM-Based User Simulators for Conversational AI
Shuhaib Mehri
Xiaocheng Yang
Takyoung Kim
Gokhan Tur
Shikib Mehri
Dilek Hakkani-Tur
LLMAG
115
2
0
27 Jul 2025
LegalEval-Q: A New Benchmark for The Quality Evaluation of LLM-Generated Legal Text
LegalEval-Q: A New Benchmark for The Quality Evaluation of LLM-Generated Legal Text
Li yunhan
Wu gengshen
AILawELMALM
391
1
0
30 May 2025
MEDAL: A Framework for Benchmarking LLMs as Multilingual Open-Domain Dialogue Evaluators
MEDAL: A Framework for Benchmarking LLMs as Multilingual Open-Domain Dialogue Evaluators
John Mendonça
A. Lavie
Isabel Trancoso
402
0
0
28 May 2025
Towards Better Evaluation for Generated Patent Claims
Towards Better Evaluation for Generated Patent ClaimsAnnual Meeting of the Association for Computational Linguistics (ACL), 2025
Lekang Jiang
Pascal A Scherz
Stephan Goetz
ELM
267
5
0
16 May 2025
JaccDiv: A Metric and Benchmark for Quantifying Diversity of Generated Marketing Text in the Music Industry
JaccDiv: A Metric and Benchmark for Quantifying Diversity of Generated Marketing Text in the Music Industry
Anum Afzal
Alexandre Mercier
Florian Matthes
317
0
0
29 Apr 2025
LLM-Evaluation Tropes: Perspectives on the Validity of LLM-Evaluations
LLM-Evaluation Tropes: Perspectives on the Validity of LLM-Evaluations
Laura Dietz
Oleg Zendel
P. Bailey
Charles L. A. Clarke
Ellese Cotterill
Jeff Dalton
Faegheh Hasibi
Mark Sanderson
Nick Craswell
ELM
255
9
0
27 Apr 2025
LLMs as Span Annotators: A Comparative Study of LLMs and Humans
LLMs as Span Annotators: A Comparative Study of LLMs and Humans
Zdeněk Kasner
Vilém Zouhar
Patrícia Schmidtová
Ivan Kartáč
Kristýna Onderková
Ondřej Plátek
Dimitra Gkatzia
Saad Mahamood
Ondrej Dusek
Simone Balloccu
ALM
432
7
0
11 Apr 2025
ReFeed: Multi-dimensional Summarization Refinement with Reflective Reasoning on Feedback
ReFeed: Multi-dimensional Summarization Refinement with Reflective Reasoning on Feedback
Taewon Yun
Jihwan Oh
Hyangsuk Min
Yuho Lee
Jihwan Bang
Jason (Jinglun) Cai
Hwanjun Song
OffRLLRM
204
1
0
27 Mar 2025
OpeNLGauge: An Explainable Metric for NLG Evaluation with Open-Weights LLMs
OpeNLGauge: An Explainable Metric for NLG Evaluation with Open-Weights LLMs
Ivan Kartáč
Mateusz Lango
Ondrej Dusek
ELM
311
5
0
14 Mar 2025
Positive-Unlabeled Diffusion Models for Preventing Sensitive Data GenerationInternational Conference on Learning Representations (ICLR), 2025
Hiroshi Takahashi
Tomoharu Iwata
Atsutoshi Kumagai
Yuuki Yamanaka
Tomoya Yamashita
DiffM
258
0
0
05 Mar 2025
Analyzing and Evaluating Correlation Measures in NLG Meta-Evaluation
Analyzing and Evaluating Correlation Measures in NLG Meta-EvaluationNorth American Chapter of the Association for Computational Linguistics (NAACL), 2024
Mingqi Gao
Xinyu Hu
Li Lin
Xiaojun Wan
213
4
0
28 Jan 2025
Beyond correlation: The Impact of Human Uncertainty in Measuring the Effectiveness of Automatic Evaluation and LLM-as-a-Judge
Beyond correlation: The Impact of Human Uncertainty in Measuring the Effectiveness of Automatic Evaluation and LLM-as-a-JudgeInternational Conference on Learning Representations (ICLR), 2024
Aparna Elangovan
Jongwoo Ko
Lei Xu
Mahsa Elyasi
Ling Liu
S. Bodapati
Dan Roth
256
19
0
28 Jan 2025
Reference-free Evaluation Metrics for Text Generation: A Survey
Reference-free Evaluation Metrics for Text Generation: A Survey
Takumi Ito
Kees van Deemter
Jun Suzuki
ELM
318
8
0
21 Jan 2025
BoK: Introducing Bag-of-Keywords Loss for Interpretable Dialogue Response Generation
BoK: Introducing Bag-of-Keywords Loss for Interpretable Dialogue Response GenerationSIGDIAL Conferences (SIGDIAL), 2025
Suvodip Dey
M. Desarkar
OffRL
213
2
0
20 Jan 2025
Hierarchical Divide-and-Conquer for Fine-Grained Alignment in LLM-Based Medical Evaluation
Hierarchical Divide-and-Conquer for Fine-Grained Alignment in LLM-Based Medical EvaluationAAAI Conference on Artificial Intelligence (AAAI), 2025
Shunfan Zheng
Xiechi Zhang
Gerard de Melo
Xiaoling Wang
Linlin Wang
LM&MAELM
125
3
0
12 Jan 2025
Measuring the Robustness of Reference-Free Dialogue Evaluation Systems
Measuring the Robustness of Reference-Free Dialogue Evaluation SystemsInternational Conference on Computational Linguistics (COLING), 2025
Justin Vasselli
Adam Nohejl
Taro Watanabe
AAML
193
0
0
12 Jan 2025
Factors in Crowdsourcing for Evaluation of Complex Dialogue Systems
Annalena Aicher
Stefan Hillmann
Isabel Feustel
Thilo Michael
Sebastian Möller
Wolfgang Minker
142
0
0
17 Nov 2024
Unstructured Text Enhanced Open-domain Dialogue System: A Systematic
  Survey
Unstructured Text Enhanced Open-domain Dialogue System: A Systematic Survey
Longxuan Ma
Mingda Li
Weinan Zhang
Jiapeng Li
Ting Liu
325
19
0
14 Nov 2024
Bridging the Gap between Expert and Language Models: Concept-guided Chess Commentary Generation and Evaluation
Bridging the Gap between Expert and Language Models: Concept-guided Chess Commentary Generation and EvaluationNorth American Chapter of the Association for Computational Linguistics (NAACL), 2024
Jaechang Kim
Jinmin Goh
Inseok Hwang
Jaewoong Cho
Jungseul Ok
ELM
215
6
0
28 Oct 2024
AGENT-CQ: Automatic Generation and Evaluation of Clarifying Questions
  for Conversational Search with LLMs
AGENT-CQ: Automatic Generation and Evaluation of Clarifying Questions for Conversational Search with LLMs
Clemencia Siro
Yifei Yuan
Mohammad Aliannejadi
Maarten de Rijke
ELM
201
6
0
25 Oct 2024
4-LEGS: 4D Language Embedded Gaussian Splatting
4-LEGS: 4D Language Embedded Gaussian Splatting
Gal Fiebelman
Tamir Cohen
Ayellet Morgenstern
Peter Hedman
Hadar Averbuch-Elor
3DGS
390
3
0
14 Oct 2024
RevisEval: Improving LLM-as-a-Judge via Response-Adapted References
RevisEval: Improving LLM-as-a-Judge via Response-Adapted ReferencesInternational Conference on Learning Representations (ICLR), 2024
Qiyuan Zhang
Yufei Wang
Tiezheng YU
Yuxin Jiang
Chuhan Wu
...
Xin Jiang
Lifeng Shang
Ruiming Tang
Fuyuan Lyu
Chen Ma
319
12
0
07 Oct 2024
CRScore: Grounding Automated Evaluation of Code Review Comments in Code Claims and Smells
CRScore: Grounding Automated Evaluation of Code Review Comments in Code Claims and SmellsNorth American Chapter of the Association for Computational Linguistics (NAACL), 2024
Atharva Naik
Marcus Alenius
Daniel Fried
Carolyn Rose
280
4
0
29 Sep 2024
Poor-Supervised Evaluation for SuperLLM via Mutual Consistency
Poor-Supervised Evaluation for SuperLLM via Mutual ConsistencyAnnual Meeting of the Association for Computational Linguistics (ACL), 2024
Peiwen Yuan
Shaoxiong Feng
Yiwei Li
Xinglin Wang
Boyuan Pan
Heda Wang
Yao Hu
Kan Li
217
1
0
25 Aug 2024
Soda-Eval: Open-Domain Dialogue Evaluation in the age of LLMs
Soda-Eval: Open-Domain Dialogue Evaluation in the age of LLMsConference on Empirical Methods in Natural Language Processing (EMNLP), 2024
John Mendonça
Isabel Trancoso
A. Lavie
ALM
214
13
0
20 Aug 2024
ECoh: Turn-level Coherence Evaluation for Multilingual Dialogues
ECoh: Turn-level Coherence Evaluation for Multilingual Dialogues
John Mendonça
Isabel Trancoso
A. Lavie
172
5
0
16 Jul 2024
A Proposed S.C.O.R.E. Evaluation Framework for Large Language Models :
  Safety, Consensus, Objectivity, Reproducibility and Explainability
A Proposed S.C.O.R.E. Evaluation Framework for Large Language Models : Safety, Consensus, Objectivity, Reproducibility and Explainability
Ting Fang Tan
Kabilan Elangovan
J. Ong
Nigam Shah
J. Sung
...
Haibo Wang
Chang Fu Kuo
Simon Chesterman
Zee Kin Yeong
Daniel Ting
ELM
114
10
0
10 Jul 2024
On the Benchmarking of LLMs for Open-Domain Dialogue Evaluation
On the Benchmarking of LLMs for Open-Domain Dialogue Evaluation
John Mendonça
A. Lavie
Isabel Trancoso
ELM
133
13
0
04 Jul 2024
LLMs instead of Human Judges? A Large Scale Empirical Study across 20 NLP Evaluation Tasks
LLMs instead of Human Judges? A Large Scale Empirical Study across 20 NLP Evaluation Tasks
A. Bavaresco
Raffaella Bernardi
Leonardo Bertolazzi
Desmond Elliott
Raquel Fernández
...
David Schlangen
Alessandro Suglia
Aditya K Surikuchi
Ece Takmaz
A. Testoni
ALMELM
596
169
0
26 Jun 2024
Leveraging LLMs for Dialogue Quality Measurement
Leveraging LLMs for Dialogue Quality Measurement
Jinghan Jia
A. Komma
Timothy Leffel
Xujun Peng
Ajay Nagesh
Tamer Soliman
Aram Galstyan
Anoop Kumar
227
7
0
25 Jun 2024
Fairer Preferences Elicit Improved Human-Aligned Large Language Model
  Judgments
Fairer Preferences Elicit Improved Human-Aligned Large Language Model Judgments
Han Zhou
Xingchen Wan
Yinhong Liu
Nigel Collier
Ivan Vulić
Anna Korhonen
ALM
181
20
0
17 Jun 2024
ComperDial: Commonsense Persona-grounded Dialogue Dataset and Benchmark
ComperDial: Commonsense Persona-grounded Dialogue Dataset and Benchmark
Hiromi Wakaki
Yuki Mitsufuji
Yoshinori Maeda
Yukiko Nishimura
Silin Gao
Mengjie Zhao
Keiichi Yamada
Antoine Bosselut
215
2
0
17 Jun 2024
Better than Random: Reliable NLG Human Evaluation with Constrained
  Active Sampling
Better than Random: Reliable NLG Human Evaluation with Constrained Active Sampling
Jie Ruan
Xiao Pu
Mingqi Gao
Xiaojun Wan
Yuesheng Zhu
182
7
0
12 Jun 2024
Recent Trends in Personalized Dialogue Generation: A Review of Datasets,
  Methodologies, and Evaluations
Recent Trends in Personalized Dialogue Generation: A Review of Datasets, Methodologies, and Evaluations
Yi-Pei Chen
Noriki Nishida
Hideki Nakayama
Yuji Matsumoto
LLMAG
264
27
0
28 May 2024
Unveiling the Achilles' Heel of NLG Evaluators: A Unified Adversarial
  Framework Driven by Large Language Models
Unveiling the Achilles' Heel of NLG Evaluators: A Unified Adversarial Framework Driven by Large Language ModelsAnnual Meeting of the Association for Computational Linguistics (ACL), 2024
Yiming Chen
Chen Zhang
Danqing Luo
L. F. D’Haro
R. Tan
Haizhou Li
AAMLELM
205
3
0
23 May 2024
DEBATE: Devil's Advocate-Based Assessment and Text Evaluation
DEBATE: Devil's Advocate-Based Assessment and Text EvaluationAnnual Meeting of the Association for Computational Linguistics (ACL), 2024
Alex G. Kim
Keonwoo Kim
Sangwon Yoon
ELM
296
16
0
16 May 2024
Efficient LLM Comparative Assessment: a Product of Experts Framework for
  Pairwise Comparisons
Efficient LLM Comparative Assessment: a Product of Experts Framework for Pairwise ComparisonsConference on Empirical Methods in Natural Language Processing (EMNLP), 2024
Adian Liusie
Vatsal Raina
Yassir Fathullah
Mark Gales
236
16
0
09 May 2024
RepEval: Effective Text Evaluation with LLM Representation
RepEval: Effective Text Evaluation with LLM Representation
Shuqian Sheng
Yi Xu
Tianhang Zhang
Zanwei Shen
Luoyi Fu
Jiaxin Ding
Lei Zhou
Xinbing Wang
Cheng Zhou
155
7
0
30 Apr 2024
Rethinking the Evaluation of Dialogue Systems: Effects of User Feedback
  on Crowdworkers and LLMs
Rethinking the Evaluation of Dialogue Systems: Effects of User Feedback on Crowdworkers and LLMs
Clemencia Siro
Mohammad Aliannejadi
Maarten de Rijke
108
5
0
19 Apr 2024
Inductive-Deductive Strategy Reuse for Multi-Turn Instructional
  Dialogues
Inductive-Deductive Strategy Reuse for Multi-Turn Instructional Dialogues
Jiao Ou
Jiayu Wu
Che Liu
Fuzheng Zhang
Chen Zhang
Kun Gai
137
7
0
17 Apr 2024
Context Does Matter: Implications for Crowdsourced Evaluation Labels in
  Task-Oriented Dialogue Systems
Context Does Matter: Implications for Crowdsourced Evaluation Labels in Task-Oriented Dialogue Systems
Clemencia Siro
Mohammad Aliannejadi
Maarten de Rijke
154
3
0
15 Apr 2024
PairEval: Open-domain Dialogue Evaluation with Pairwise Comparison
PairEval: Open-domain Dialogue Evaluation with Pairwise Comparison
Yujin Baek
Minseok Choi
Dohyun Lee
Jaegul Choo
296
14
0
01 Apr 2024
FEEL: A Framework for Evaluating Emotional Support Capability with Large
  Language Models
FEEL: A Framework for Evaluating Emotional Support Capability with Large Language ModelsInternational Conference on Intelligent Computing (ICIC), 2024
Huaiwen Zhang
Yu Chen
Ming Wang
Shi Feng
252
3
0
23 Mar 2024
Is Reference Necessary in the Evaluation of NLG Systems? When and Where?
Is Reference Necessary in the Evaluation of NLG Systems? When and Where?
Shuqian Sheng
Yi Xu
Luoyi Fu
Jiaxin Ding
Lei Zhou
Xinbing Wang
Cheng Zhou
162
6
0
21 Mar 2024
1234
Next