ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2005.00583
  4. Cited By
Learning an Unreferenced Metric for Online Dialogue Evaluation

Learning an Unreferenced Metric for Online Dialogue Evaluation

Annual Meeting of the Association for Computational Linguistics (ACL), 2020
1 May 2020
Koustuv Sinha
Prasanna Parthasarathi
Jasmine Wang
Ryan J. Lowe
William L. Hamilton
Joelle Pineau
    OffRL
ArXiv (abs)PDFHTML

Papers citing "Learning an Unreferenced Metric for Online Dialogue Evaluation"

50 / 57 papers shown
Evaluating LLM-Generated Versus Human-Authored Responses in Role-Play Dialogues
Evaluating LLM-Generated Versus Human-Authored Responses in Role-Play Dialogues
Dongxu Lu
Johan Jeuring
Albert Gatt
274
1
0
22 Sep 2025
BoK: Introducing Bag-of-Keywords Loss for Interpretable Dialogue Response Generation
BoK: Introducing Bag-of-Keywords Loss for Interpretable Dialogue Response GenerationSIGDIAL Conferences (SIGDIAL), 2025
Suvodip Dey
M. Desarkar
OffRL
345
2
0
20 Jan 2025
Interaction Matters: An Evaluation Framework for Interactive Dialogue Assessment on English Second Language Conversations
Interaction Matters: An Evaluation Framework for Interactive Dialogue Assessment on English Second Language Conversations
Rena Gao
Carsten Roever
Jey Han Lau
252
7
0
09 Jul 2024
Favi-Score: A Measure for Favoritism in Automated Preference Ratings for
  Generative AI Evaluation
Favi-Score: A Measure for Favoritism in Automated Preference Ratings for Generative AI Evaluation
Pius von Daniken
Jan Deriu
Don Tuggener
Mark Cieliebak
269
2
0
03 Jun 2024
Length-Controlled AlpacaEval: A Simple Way to Debias Automatic Evaluators
Length-Controlled AlpacaEval: A Simple Way to Debias Automatic Evaluators
Yann Dubois
Balázs Galambosi
Abigail Z. Jacobs
Tatsunori Hashimoto
ALM
543
727
0
06 Apr 2024
Only Send What You Need: Learning to Communicate Efficiently in Federated Multilingual Machine Translation
Only Send What You Need: Learning to Communicate Efficiently in Federated Multilingual Machine TranslationIEEE Transactions on Audio, Speech, and Language Processing (IEEE TASLP), 2024
Yun-Wei Chu
Dong-Jun Han
Christopher G. Brinton
383
7
0
15 Jan 2024
A Survey of Personality, Persona, and Profile in Conversational Agents
  and Chatbots
A Survey of Personality, Persona, and Profile in Conversational Agents and Chatbots
Richard Sutcliffe
488
10
0
31 Dec 2023
CoAScore: Chain-of-Aspects Prompting for NLG Evaluation
CoAScore: Chain-of-Aspects Prompting for NLG Evaluation
Peiyuan Gong
Jiaxin Mao
ELM
365
16
0
16 Dec 2023
Dialogue Quality and Emotion Annotations for Customer Support
  Conversations
Dialogue Quality and Emotion Annotations for Customer Support ConversationsIEEE Games Entertainment Media Conference (IEEE GEM), 2023
John Mendoncca
Patrícia Pereira
Miguel Menezes
Vera Cabarrão
Ana C. Farinha
Helena Moniz
Joao Paulo Carvalho
A. Lavie
Isabel Trancoso
204
4
0
23 Nov 2023
Automatic Evaluation of Generative Models with Instruction Tuning
Automatic Evaluation of Generative Models with Instruction TuningIEEE Games Entertainment Media Conference (IEEE GEM), 2023
Shuhaib Mehri
Vered Shwartz
ELMALM
185
4
0
30 Oct 2023
DiQAD: A Benchmark Dataset for End-to-End Open-domain Dialogue
  Assessment
DiQAD: A Benchmark Dataset for End-to-End Open-domain Dialogue Assessment
Yukun Zhao
Lingyong Yan
Weiwei Sun
Chong Meng
Shuaiqiang Wang
Zhicong Cheng
Zhaochun Ren
D. Yin
ELM
179
0
0
25 Oct 2023
xDial-Eval: A Multilingual Open-Domain Dialogue Evaluation Benchmark
xDial-Eval: A Multilingual Open-Domain Dialogue Evaluation BenchmarkConference on Empirical Methods in Natural Language Processing (EMNLP), 2023
Chen Zhang
L. F. D’Haro
Chengguang Tang
Ke Shi
Guohua Tang
Haizhou Li
ELM
247
17
0
13 Oct 2023
RADE: Reference-Assisted Dialogue Evaluation for Open-Domain Dialogue
RADE: Reference-Assisted Dialogue Evaluation for Open-Domain DialogueAnnual Meeting of the Association for Computational Linguistics (ACL), 2023
Zhengliang Shi
Weiwei Sun
Shuo Zhang
Zhen Zhang
Sudipta Singha Roy
Zhaochun Ren
341
12
0
15 Sep 2023
Simple LLM Prompting is State-of-the-Art for Robust and Multilingual
  Dialogue Evaluation
Simple LLM Prompting is State-of-the-Art for Robust and Multilingual Dialogue Evaluation
J. Mendoncca
Patrícia Pereira
Helena Moniz
Joao Paulo Carvalho
A. Lavie
Isabel Trancoso
298
26
0
31 Aug 2023
Towards Multilingual Automatic Dialogue Evaluation
Towards Multilingual Automatic Dialogue EvaluationSIGDIAL Conferences (SIGDIAL), 2023
John Mendonça
A. Lavie
Isabel Trancoso
206
0
0
31 Aug 2023
Three Ways of Using Large Language Models to Evaluate Chat
Three Ways of Using Large Language Models to Evaluate Chat
Ondvrej Plátek
Vojtvech Hudevcek
Patrícia Schmidtová
Mateusz Lango
Ondrej Dusek
ALM
234
7
0
12 Aug 2023
Correction of Errors in Preference Ratings from Automated Metrics for
  Text Generation
Correction of Errors in Preference Ratings from Automated Metrics for Text GenerationAnnual Meeting of the Association for Computational Linguistics (ACL), 2023
Jan Deriu
Pius von Daniken
Don Tuggener
Mark Cieliebak
253
2
0
06 Jun 2023
Evaluating Open-Domain Dialogues in Latent Space with Next Sentence
  Prediction and Mutual Information
Evaluating Open-Domain Dialogues in Latent Space with Next Sentence Prediction and Mutual InformationAnnual Meeting of the Association for Computational Linguistics (ACL), 2023
Kun Zhao
Bohao Yang
Chenghua Lin
Wenge Rong
Aline Villavicencio
Xiaohui Cui
DRL
298
12
0
26 May 2023
What Comes Next? Evaluating Uncertainty in Neural Text Generators
  Against Human Production Variability
What Comes Next? Evaluating Uncertainty in Neural Text Generators Against Human Production VariabilityConference on Empirical Methods in Natural Language Processing (EMNLP), 2023
Mario Giulianelli
Joris Baan
Wilker Aziz
Raquel Fernández
Barbara Plank
UQLM
536
46
0
19 May 2023
DEnsity: Open-domain Dialogue Evaluation Metric using Density Estimation
DEnsity: Open-domain Dialogue Evaluation Metric using Density EstimationAnnual Meeting of the Association for Computational Linguistics (ACL), 2023
Yujin Baek
Seungil Lee
Daniel Rim
Jaegul Choo
272
6
0
08 May 2023
Improving Open-Domain Dialogue Evaluation with a Causal Inference Model
Improving Open-Domain Dialogue Evaluation with a Causal Inference Model
Cat P. Le
Luke Dai
Michael Johnston
Yang Liu
M. Walker
R. Ghanadan
ELM
218
11
0
31 Jan 2023
PoE: a Panel of Experts for Generalized Automatic Dialogue Assessment
PoE: a Panel of Experts for Generalized Automatic Dialogue AssessmentIEEE/ACM Transactions on Audio Speech and Language Processing (TASLP), 2022
Chen Zhang
L. F. D’Haro
Qiquan Zhang
Thomas Friedrichs
Haizhou Li
212
8
0
18 Dec 2022
FineD-Eval: Fine-grained Automatic Dialogue-Level Evaluation
FineD-Eval: Fine-grained Automatic Dialogue-Level EvaluationConference on Empirical Methods in Natural Language Processing (EMNLP), 2022
Chen Zhang
L. F. D’Haro
Qiquan Zhang
Thomas Friedrichs
Haizhou Li
242
24
0
25 Oct 2022
Measuring and Improving Semantic Diversity of Dialogue Generation
Measuring and Improving Semantic Diversity of Dialogue GenerationConference on Empirical Methods in Natural Language Processing (EMNLP), 2022
Seungju Han
Beomsu Kim
Buru Chang
220
24
0
11 Oct 2022
An Equal-Size Hard EM Algorithm for Diverse Dialogue Generation
An Equal-Size Hard EM Algorithm for Diverse Dialogue GenerationInternational Conference on Learning Representations (ICLR), 2022
Yuqiao Wen
Yongchang Hao
Yanshuai Cao
Lili Mou
351
15
0
29 Sep 2022
Open-Domain Dialog Evaluation using Follow-Ups Likelihood
Open-Domain Dialog Evaluation using Follow-Ups LikelihoodInternational Conference on Computational Linguistics (COLING), 2022
Maxime De Bruyn
Ehsan Lotfi
Jeska Buhmann
Walter Daelemans
242
9
0
12 Sep 2022
Evaluation of Question Answering Systems: Complexity of judging a
  natural language
Evaluation of Question Answering Systems: Complexity of judging a natural languageACM Computing Surveys (ACM CSUR), 2022
Amer Farea
Zhen Yang
Kien Duong
Nadeesha Perera
F. Emmert-Streib
ELM
319
13
0
10 Sep 2022
Interactive Evaluation of Dialog Track at DSTC9
Interactive Evaluation of Dialog Track at DSTC9International Conference on Language Resources and Evaluation (LREC), 2022
Shikib Mehri
Yulan Feng
Carla Gordon
S. Alavi
David Traum
M. Eskénazi
366
14
0
28 Jul 2022
MME-CRS: Multi-Metric Evaluation Based on Correlation Re-Scaling for
  Evaluating Open-Domain Dialogue
MME-CRS: Multi-Metric Evaluation Based on Correlation Re-Scaling for Evaluating Open-Domain Dialogue
Pengfei Zhang
Xiao-fei Hu
Kaidong Yu
Jian Wang
Song-Bo Han
Cao Liu
C. Yuan
169
7
0
19 Jun 2022
Why is constrained neural language generation particularly challenging?
Why is constrained neural language generation particularly challenging?
Cristina Garbacea
Qiaozhu Mei
486
20
0
11 Jun 2022
InstructDial: Improving Zero and Few-shot Generalization in Dialogue
  through Instruction Tuning
InstructDial: Improving Zero and Few-shot Generalization in Dialogue through Instruction TuningConference on Empirical Methods in Natural Language Processing (EMNLP), 2022
Prakhar Gupta
Cathy Jiao
Yi-Ting Yeh
Shikib Mehri
M. Eskénazi
Jeffrey P. Bigham
ALM
411
56
0
25 May 2022
What should I Ask: A Knowledge-driven Approach for Follow-up Questions
  Generation in Conversational Surveys
What should I Ask: A Knowledge-driven Approach for Follow-up Questions Generation in Conversational SurveysPacific Asia Conference on Language, Information and Computation (PACLIC), 2022
Yubin Ge
Ziang Xiao
Jana Diesner
Heng Ji
Karrie Karahalios
Hari Sundaram
393
26
0
23 May 2022
CORAL: Contextual Response Retrievability Loss Function for Training
  Dialog Generation Models
CORAL: Contextual Response Retrievability Loss Function for Training Dialog Generation Models
Bishal Santra
Ravi Ghadia
Manish Gupta
Pawan Goyal
OffRL
320
0
0
21 May 2022
Meet Your Favorite Character: Open-domain Chatbot Mimicking Fictional
  Characters with only a Few Utterances
Meet Your Favorite Character: Open-domain Chatbot Mimicking Fictional Characters with only a Few UtterancesNorth American Chapter of the Association for Computational Linguistics (NAACL), 2022
Seungju Han
Beomsu Kim
Jin Yong Yoo
Seokjun Seo
Sangbum Kim
Enkhbayar Erdenee
Buru Chang
AI4CE
278
46
0
22 Apr 2022
Persona-Guided Planning for Controlling the Protagonist's Persona in
  Story Generation
Persona-Guided Planning for Controlling the Protagonist's Persona in Story GenerationNorth American Chapter of the Association for Computational Linguistics (NAACL), 2022
Zhexin Zhang
Jiaxin Wen
Jian Guan
Shiyu Huang
199
27
0
22 Apr 2022
Spurious Correlations in Reference-Free Evaluation of Text Generation
Spurious Correlations in Reference-Free Evaluation of Text GenerationAnnual Meeting of the Association for Computational Linguistics (ACL), 2022
Esin Durmus
Faisal Ladhak
Tatsunori Hashimoto
184
37
0
21 Apr 2022
What is wrong with you?: Leveraging User Sentiment for Automatic Dialog
  Evaluation
What is wrong with you?: Leveraging User Sentiment for Automatic Dialog EvaluationFindings (Findings), 2022
Sarik Ghazarian
Behnam Hedayatnia
Alexandros Papangelis
Yang Liu
Dilek Z. Hakkani-Tür
259
22
0
25 Mar 2022
Report from the NSF Future Directions Workshop on Automatic Evaluation
  of Dialog: Research Directions and Challenges
Report from the NSF Future Directions Workshop on Automatic Evaluation of Dialog: Research Directions and Challenges
Shikib Mehri
Jinho Choi
L. F. D’Haro
Jan Deriu
M. Eskénazi
...
David Traum
Yi-Ting Yeh
Zhou Yu
Yizhe Zhang
Chen Zhang
273
23
0
18 Mar 2022
Probing the Robustness of Trained Metrics for Conversational Dialogue
  Systems
Probing the Robustness of Trained Metrics for Conversational Dialogue SystemsAnnual Meeting of the Association for Computational Linguistics (ACL), 2022
Jan Deriu
Don Tuggener
Pius von Daniken
Mark Cieliebak
AAML
168
11
0
28 Feb 2022
MDD-Eval: Self-Training on Augmented Data for Multi-Domain Dialogue
  Evaluation
MDD-Eval: Self-Training on Augmented Data for Multi-Domain Dialogue Evaluation
Chen Zhang
L. F. D’Haro
Thomas Friedrichs
Haizhou Li
ELM
283
22
0
14 Dec 2021
Identifying Untrustworthy Samples: Data Filtering for Open-domain
  Dialogues with Bayesian Optimization
Identifying Untrustworthy Samples: Data Filtering for Open-domain Dialogues with Bayesian Optimization
Lei Shen
Haolan Zhan
Xin Shen
Hongshen Chen
Xiaofang Zhao
Xiao-Dan Zhu
305
19
0
14 Sep 2021
Perturbation CheckLists for Evaluating NLG Evaluation Metrics
Perturbation CheckLists for Evaluating NLG Evaluation Metrics
Ananya B. Sai
Tanay Dixit
D. Y. Sheth
S. Mohan
Mitesh M. Khapra
AAML
403
66
0
13 Sep 2021
POSSCORE: A Simple Yet Effective Evaluation of Conversational Search
  with Part of Speech Labelling
POSSCORE: A Simple Yet Effective Evaluation of Conversational Search with Part of Speech LabellingInternational Conference on Information and Knowledge Management (CIKM), 2021
Zeyang Liu
K. Zhou
Jiaxin Mao
Max L. Wilson
199
3
0
07 Sep 2021
Distilling the Knowledge of Large-scale Generative Models into Retrieval
  Models for Efficient Open-domain Conversation
Distilling the Knowledge of Large-scale Generative Models into Retrieval Models for Efficient Open-domain ConversationConference on Empirical Methods in Natural Language Processing (EMNLP), 2021
Beomsu Kim
Seokjun Seo
Seungju Han
Enkhbayar Erdenee
Buru Chang
RALM
257
6
0
28 Aug 2021
Do Encoder Representations of Generative Dialogue Models Encode
  Sufficient Information about the Task ?
Do Encoder Representations of Generative Dialogue Models Encode Sufficient Information about the Task ?SIGDIAL Conferences (SIGDIAL), 2021
Prasanna Parthasarathi
J. Pineau
Sarath Chandar
263
2
0
20 Jun 2021
A Brief Study on the Effects of Training Generative Dialogue Models with
  a Semantic loss
A Brief Study on the Effects of Training Generative Dialogue Models with a Semantic lossSIGDIAL Conferences (SIGDIAL), 2021
Prasanna Parthasarathi
Mohamed Abdelsalam
J. Pineau
Sarath Chandar
145
0
0
20 Jun 2021
A Comprehensive Assessment of Dialog Evaluation Metrics
A Comprehensive Assessment of Dialog Evaluation Metrics
Yi-Ting Yeh
M. Eskénazi
Shikib Mehri
358
122
0
07 Jun 2021
DynaEval: Unifying Turn and Dialogue Level Evaluation
DynaEval: Unifying Turn and Dialogue Level EvaluationAnnual Meeting of the Association for Computational Linguistics (ACL), 2021
Chen Zhang
Yiming Chen
L. F. D’Haro
Yan Zhang
Thomas Friedrichs
Grandee Lee
Haizhou Li
227
81
0
02 Jun 2021
Towards Standard Criteria for human evaluation of Chatbots: A Survey
Towards Standard Criteria for human evaluation of Chatbots: A Survey
Hongru Liang
Huaqing Li
191
18
0
24 May 2021
Recent Advances in Deep Learning Based Dialogue Systems: A Systematic
  Survey
Recent Advances in Deep Learning Based Dialogue Systems: A Systematic SurveyArtificial Intelligence Review (AIR), 2021
Jinjie Ni
Tom Young
Vlad Pandelea
Fuzhao Xue
Xiaoshi Zhong
1.0K
336
0
10 May 2021
12
Next
Page 1 of 2