ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2111.02110
  4. Cited By
Automatic Evaluation and Moderation of Open-domain Dialogue Systems
v1v2v3 (latest)

Automatic Evaluation and Moderation of Open-domain Dialogue Systems

3 November 2021
Chen Zhang
João Sedoc
L. F. D’Haro
Rafael E. Banchs
Alexander I. Rudnicky
ArXiv (abs)PDFHTMLGithub

Papers citing "Automatic Evaluation and Moderation of Open-domain Dialogue Systems"

26 / 26 papers shown
Toxicity in Online Platforms and AI Systems: A Survey of Needs, Challenges, Mitigations, and Future Directions
Toxicity in Online Platforms and AI Systems: A Survey of Needs, Challenges, Mitigations, and Future DirectionsExpert systems with applications (ESWA), 2025
Smita Khapre
Melkamu Mersha
Hassan Shakil
Jonali Baruah
Jugal Kalita
216
4
0
29 Sep 2025
Overview of Dialog System Evaluation Track: Dimensionality, Language, Culture and Safety at DSTC 12
Overview of Dialog System Evaluation Track: Dimensionality, Language, Culture and Safety at DSTC 12
John Mendonça
Lining Zhang
Rahul Mallidi
A. Lavie
Isabel Trancoso
L. F. D’Haro
João Sedoc
ALM
192
2
0
16 Sep 2025
MEDAL: A Framework for Benchmarking LLMs as Multilingual Open-Domain Dialogue Evaluators
MEDAL: A Framework for Benchmarking LLMs as Multilingual Open-Domain Dialogue Evaluators
John Mendonça
A. Lavie
Isabel Trancoso
565
0
0
28 May 2025
Soda-Eval: Open-Domain Dialogue Evaluation in the age of LLMs
Soda-Eval: Open-Domain Dialogue Evaluation in the age of LLMsConference on Empirical Methods in Natural Language Processing (EMNLP), 2024
John Mendonça
Isabel Trancoso
A. Lavie
ALM
291
14
0
20 Aug 2024
On the Benchmarking of LLMs for Open-Domain Dialogue Evaluation
On the Benchmarking of LLMs for Open-Domain Dialogue Evaluation
John Mendonça
A. Lavie
Isabel Trancoso
ELM
204
14
0
04 Jul 2024
Themis: Towards Flexible and Interpretable NLG Evaluation
Themis: Towards Flexible and Interpretable NLG Evaluation
Xinyu Hu
Li Lin
Mingqi Gao
Xunjian Yin
Xiaojun Wan
ELM
340
0
0
26 Jun 2024
An Analysis of User Behaviors for Objectively Evaluating Spoken Dialogue
  Systems
An Analysis of User Behaviors for Objectively Evaluating Spoken Dialogue Systems
K. Inoue
Divesh Lala
Keiko Ochi
Tatsuya Kawahara
Gabriel Skantze
247
1
0
10 Jan 2024
The DSA Transparency Database: Auditing Self-reported Moderation Actions by Social Media
The DSA Transparency Database: Auditing Self-reported Moderation Actions by Social Media
Amaury Trujillo
T. Fagni
S. Cresci
315
28
0
16 Dec 2023
Dialogue Quality and Emotion Annotations for Customer Support
  Conversations
Dialogue Quality and Emotion Annotations for Customer Support ConversationsIEEE Games Entertainment Media Conference (IEEE GEM), 2023
John Mendoncca
Patrícia Pereira
Miguel Menezes
Vera Cabarrão
Ana C. Farinha
Helena Moniz
Joao Paulo Carvalho
A. Lavie
Isabel Trancoso
204
4
0
23 Nov 2023
xDial-Eval: A Multilingual Open-Domain Dialogue Evaluation Benchmark
xDial-Eval: A Multilingual Open-Domain Dialogue Evaluation BenchmarkConference on Empirical Methods in Natural Language Processing (EMNLP), 2023
Chen Zhang
L. F. D’Haro
Chengguang Tang
Ke Shi
Guohua Tang
Haizhou Li
ELM
247
17
0
13 Oct 2023
Towards Multilingual Automatic Dialogue Evaluation
Towards Multilingual Automatic Dialogue EvaluationSIGDIAL Conferences (SIGDIAL), 2023
John Mendonça
A. Lavie
Isabel Trancoso
206
0
0
31 Aug 2023
Towards Objective Evaluation of Socially-Situated Conversational Robots:
  Assessing Human-Likeness through Multimodal User Behaviors
Towards Objective Evaluation of Socially-Situated Conversational Robots: Assessing Human-Likeness through Multimodal User Behaviors
K. Inoue
Divesh Lala
Keiko Ochi
Tatsuya Kawahara
Gabriel Skantze
179
0
0
21 Aug 2023
Overview of Robust and Multilingual Automatic Evaluation Metrics for
  Open-Domain Dialogue Systems at DSTC 11 Track 4
Overview of Robust and Multilingual Automatic Evaluation Metrics for Open-Domain Dialogue Systems at DSTC 11 Track 4
Mario Rodríguez-Cantelar
Chen Zhang
Chengguang Tang
Ke Shi
Sarik Ghazarian
João Sedoc
L. F. D’Haro
Alexander I. Rudnicky
291
16
0
22 Jun 2023
Psychological Metrics for Dialog System Evaluation
Psychological Metrics for Dialog System Evaluation
Salvatore Giorgi
Shreya Havaldar
Farhan S. Ahmed
Zuhaib Akhtar
Shalaka Vaidya
Gary Pan
Pallavi V. Kulkarni
H. Andrew Schwartz
Joao Sedoc
464
7
0
24 May 2023
Evaluate What You Can't Evaluate: Unassessable Quality for Generated
  Response
Evaluate What You Can't Evaluate: Unassessable Quality for Generated Response
Yongkang Liu
Shi Feng
Daling Wang
Yifei Zhang
Hinrich Schütze
ALMELM
287
2
0
24 May 2023
How to Choose How to Choose Your Chatbot: A Massively Multi-System
  MultiReference Data Set for Dialog Metric Evaluation
How to Choose How to Choose Your Chatbot: A Massively Multi-System MultiReference Data Set for Dialog Metric Evaluation
Huda Khayrallah
Zuhaib Akhtar
Edward Cohen
João Sedoc
252
2
0
23 May 2023
LLM-Eval: Unified Multi-Dimensional Automatic Evaluation for Open-Domain
  Conversations with Large Language Models
LLM-Eval: Unified Multi-Dimensional Automatic Evaluation for Open-Domain Conversations with Large Language Models
Yen-Ting Lin
Yun-Nung Chen
253
126
0
23 May 2023
Complex QA and language models hybrid architectures, Survey
Complex QA and language models hybrid architectures, Survey
Xavier Daull
P. Bellot
Emmanuel Bruno
Vincent Martin
Elisabeth Murisasco
ELM
846
17
0
17 Feb 2023
Understanding the Effectiveness of Very Large Language Models on Dialog
  Evaluation
Understanding the Effectiveness of Very Large Language Models on Dialog Evaluation
Jessica Huynh
Cathy Jiao
Prakhar Gupta
Shikib Mehri
Payal Bajaj
Vishrav Chaudhary
M. Eskénazi
ELMLM&MA
261
19
0
27 Jan 2023
PoE: a Panel of Experts for Generalized Automatic Dialogue Assessment
PoE: a Panel of Experts for Generalized Automatic Dialogue AssessmentIEEE/ACM Transactions on Audio Speech and Language Processing (TASLP), 2022
Chen Zhang
L. F. D’Haro
Qiquan Zhang
Thomas Friedrichs
Haizhou Li
212
8
0
18 Dec 2022
FineD-Eval: Fine-grained Automatic Dialogue-Level Evaluation
FineD-Eval: Fine-grained Automatic Dialogue-Level EvaluationConference on Empirical Methods in Natural Language Processing (EMNLP), 2022
Chen Zhang
L. F. D’Haro
Qiquan Zhang
Thomas Friedrichs
Haizhou Li
242
24
0
25 Oct 2022
MME-CRS: Multi-Metric Evaluation Based on Correlation Re-Scaling for
  Evaluating Open-Domain Dialogue
MME-CRS: Multi-Metric Evaluation Based on Correlation Re-Scaling for Evaluating Open-Domain Dialogue
Pengfei Zhang
Xiao-fei Hu
Kaidong Yu
Jian Wang
Song-Bo Han
Cao Liu
C. Yuan
169
7
0
19 Jun 2022
Relevance in Dialogue: Is Less More? An Empirical Comparison of Existing
  Metrics, and a Novel Simple Metric
Relevance in Dialogue: Is Less More? An Empirical Comparison of Existing Metrics, and a Novel Simple Metric
Ian Berlot-Attwell
Frank Rudzicz
233
1
0
03 Jun 2022
InstructDial: Improving Zero and Few-shot Generalization in Dialogue
  through Instruction Tuning
InstructDial: Improving Zero and Few-shot Generalization in Dialogue through Instruction TuningConference on Empirical Methods in Natural Language Processing (EMNLP), 2022
Prakhar Gupta
Cathy Jiao
Yi-Ting Yeh
Shikib Mehri
M. Eskénazi
Jeffrey P. Bigham
ALM
411
56
0
25 May 2022
Report from the NSF Future Directions Workshop on Automatic Evaluation
  of Dialog: Research Directions and Challenges
Report from the NSF Future Directions Workshop on Automatic Evaluation of Dialog: Research Directions and Challenges
Shikib Mehri
Jinho Choi
L. F. D’Haro
Jan Deriu
M. Eskénazi
...
David Traum
Yi-Ting Yeh
Zhou Yu
Yizhe Zhang
Chen Zhang
273
23
0
18 Mar 2022
MDD-Eval: Self-Training on Augmented Data for Multi-Domain Dialogue
  Evaluation
MDD-Eval: Self-Training on Augmented Data for Multi-Domain Dialogue Evaluation
Chen Zhang
L. F. D’Haro
Thomas Friedrichs
Haizhou Li
ELM
283
22
0
14 Dec 2021
1
Page 1 of 1