ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2004.04908
  4. Cited By
Designing Precise and Robust Dialogue Response Evaluators
v1v2 (latest)

Designing Precise and Robust Dialogue Response Evaluators

10 April 2020
Tianyu Zhao
Divesh Lala
Tatsuya Kawahara
ArXiv (abs)PDFHTMLGithub (42★)

Papers citing "Designing Precise and Robust Dialogue Response Evaluators"

39 / 39 papers shown
Title
OpeNLGauge: An Explainable Metric for NLG Evaluation with Open-Weights LLMs
OpeNLGauge: An Explainable Metric for NLG Evaluation with Open-Weights LLMs
Ivan Kartáč
Mateusz Lango
Ondrej Dusek
ELM
85
1
0
14 Mar 2025
Measuring the Robustness of Reference-Free Dialogue Evaluation Systems
Measuring the Robustness of Reference-Free Dialogue Evaluation Systems
Justin Vasselli
Adam Nohejl
Taro Watanabe
AAML
80
0
0
12 Jan 2025
Masked Graph Learning with Recurrent Alignment for Multimodal Emotion
  Recognition in Conversation
Masked Graph Learning with Recurrent Alignment for Multimodal Emotion Recognition in Conversation
Tao Meng
Fuchen Zhang
Yuntao Shou
Hongen Shao
Wei Ai
Keqin Li
94
17
0
23 Jul 2024
ECoh: Turn-level Coherence Evaluation for Multilingual Dialogues
ECoh: Turn-level Coherence Evaluation for Multilingual Dialogues
John Mendonça
Isabel Trancoso
A. Lavie
67
3
0
16 Jul 2024
On the Benchmarking of LLMs for Open-Domain Dialogue Evaluation
On the Benchmarking of LLMs for Open-Domain Dialogue Evaluation
John Mendonça
A. Lavie
Isabel Trancoso
ELM
55
3
0
04 Jul 2024
Themis: Towards Flexible and Interpretable NLG Evaluation
Themis: Towards Flexible and Interpretable NLG Evaluation
Xinyu Hu
Li Lin
Mingqi Gao
Xunjian Yin
Xiaojun Wan
ELM
91
8
0
26 Jun 2024
ComperDial: Commonsense Persona-grounded Dialogue Dataset and Benchmark
ComperDial: Commonsense Persona-grounded Dialogue Dataset and Benchmark
Hiromi Wakaki
Yuki Mitsufuji
Yoshinori Maeda
Yukiko Nishimura
Silin Gao
Mengjie Zhao
Keiichi Yamada
Antoine Bosselut
96
0
0
17 Jun 2024
Unveiling the Achilles' Heel of NLG Evaluators: A Unified Adversarial
  Framework Driven by Large Language Models
Unveiling the Achilles' Heel of NLG Evaluators: A Unified Adversarial Framework Driven by Large Language Models
Yiming Chen
Chen Zhang
Danqing Luo
L. F. D’Haro
R. Tan
Haizhou Li
AAMLELM
89
3
0
23 May 2024
PairEval: Open-domain Dialogue Evaluation with Pairwise Comparison
PairEval: Open-domain Dialogue Evaluation with Pairwise Comparison
Yujin Baek
Minseok Choi
Dohyun Lee
Jaegul Choo
89
8
0
01 Apr 2024
A Comprehensive Analysis of the Effectiveness of Large Language Models
  as Automatic Dialogue Evaluators
A Comprehensive Analysis of the Effectiveness of Large Language Models as Automatic Dialogue Evaluators
Chen Zhang
L. F. D’Haro
Yiming Chen
Malu Zhang
Haizhou Li
ELM
80
31
0
24 Dec 2023
CESAR: Automatic Induction of Compositional Instructions for Multi-turn
  Dialogs
CESAR: Automatic Induction of Compositional Instructions for Multi-turn Dialogs
Taha İbrahim Aksu
Devamanyu Hazarika
Shikib Mehri
Seokhwan Kim
Dilek Z. Hakkani-Tür
Yang Liu
Mahdi Namazifar
112
3
0
29 Nov 2023
Automatic Evaluation of Generative Models with Instruction Tuning
Automatic Evaluation of Generative Models with Instruction Tuning
Shuhaib Mehri
Vered Shwartz
ELMALM
51
1
0
30 Oct 2023
DiQAD: A Benchmark Dataset for End-to-End Open-domain Dialogue
  Assessment
DiQAD: A Benchmark Dataset for End-to-End Open-domain Dialogue Assessment
Yukun Zhao
Lingyong Yan
Weiwei Sun
Chong Meng
Shuaiqiang Wang
Zhicong Cheng
Zhaochun Ren
D. Yin
ELM
52
0
0
25 Oct 2023
xDial-Eval: A Multilingual Open-Domain Dialogue Evaluation Benchmark
xDial-Eval: A Multilingual Open-Domain Dialogue Evaluation Benchmark
Chen Zhang
L. F. D’Haro
Chengguang Tang
Ke Shi
Guohua Tang
Haizhou Li
ELM
72
11
0
13 Oct 2023
Simple LLM Prompting is State-of-the-Art for Robust and Multilingual
  Dialogue Evaluation
Simple LLM Prompting is State-of-the-Art for Robust and Multilingual Dialogue Evaluation
J. Mendoncca
Patrícia Pereira
Helena Moniz
Joao Paulo Carvalho
A. Lavie
Isabel Trancoso
106
19
0
31 Aug 2023
Towards Multilingual Automatic Dialogue Evaluation
Towards Multilingual Automatic Dialogue Evaluation
John Mendonça
A. Lavie
Isabel Trancoso
55
0
0
31 Aug 2023
Overview of Robust and Multilingual Automatic Evaluation Metrics for
  Open-Domain Dialogue Systems at DSTC 11 Track 4
Overview of Robust and Multilingual Automatic Evaluation Metrics for Open-Domain Dialogue Systems at DSTC 11 Track 4
Mario Rodríguez-Cantelar
Chen Zhang
Chengguang Tang
Ke Shi
Sarik Ghazarian
João Sedoc
L. F. D’Haro
Alexander I. Rudnicky
89
10
0
22 Jun 2023
Evaluating Open-Domain Dialogues in Latent Space with Next Sentence
  Prediction and Mutual Information
Evaluating Open-Domain Dialogues in Latent Space with Next Sentence Prediction and Mutual Information
Kun Zhao
Bohao Yang
Chenghua Lin
Wenge Rong
Aline Villavicencio
Xiaohui Cui
DRL
67
4
0
26 May 2023
Psychological Metrics for Dialog System Evaluation
Psychological Metrics for Dialog System Evaluation
Salvatore Giorgi
Shreya Havaldar
Farhan S. Ahmed
Zuhaib Akhtar
Shalaka Vaidya
Gary Pan
Pallavi V. Kulkarni
H. Andrew Schwartz
Joao Sedoc
94
2
0
24 May 2023
DEnsity: Open-domain Dialogue Evaluation Metric using Density Estimation
DEnsity: Open-domain Dialogue Evaluation Metric using Density Estimation
Yujin Baek
Seungil Lee
Daniel Rim
Jaegul Choo
50
4
0
08 May 2023
Understanding the Effectiveness of Very Large Language Models on Dialog
  Evaluation
Understanding the Effectiveness of Very Large Language Models on Dialog Evaluation
Jessica Huynh
Cathy Jiao
Prakhar Gupta
Shikib Mehri
Payal Bajaj
Vishrav Chaudhary
M. Eskénazi
ELMLM&MA
73
17
0
27 Jan 2023
PoE: a Panel of Experts for Generalized Automatic Dialogue Assessment
PoE: a Panel of Experts for Generalized Automatic Dialogue Assessment
Chen Zhang
L. F. D’Haro
Qiquan Zhang
Thomas Friedrichs
Haizhou Li
77
7
0
18 Dec 2022
State-of-the-art generalisation research in NLP: A taxonomy and review
State-of-the-art generalisation research in NLP: A taxonomy and review
Dieuwke Hupkes
Mario Giulianelli
Verna Dankers
Mikel Artetxe
Yanai Elazar
...
Leila Khalatbari
Maria Ryskina
Rita Frieske
Ryan Cotterell
Zhijing Jin
270
99
0
06 Oct 2022
MME-CRS: Multi-Metric Evaluation Based on Correlation Re-Scaling for
  Evaluating Open-Domain Dialogue
MME-CRS: Multi-Metric Evaluation Based on Correlation Re-Scaling for Evaluating Open-Domain Dialogue
Pengfei Zhang
Xiao-fei Hu
Kaidong Yu
Jian Wang
Song-Bo Han
Cao Liu
C. Yuan
46
7
0
19 Jun 2022
Generate, Evaluate, and Select: A Dialogue System with a Response
  Evaluator for Diversity-Aware Response Generation
Generate, Evaluate, and Select: A Dialogue System with a Response Evaluator for Diversity-Aware Response Generation
Ryoma Sakaeda
Daisuke Kawahara
28
5
0
10 Jun 2022
Empathic Conversations: A Multi-level Dataset of Contextualized
  Conversations
Empathic Conversations: A Multi-level Dataset of Contextualized Conversations
Damilola Omitaomu
Shabnam Tafreshi
Tingting Liu
Sven Buechel
Chris Callison-Burch
J. Eichstaedt
Lyle Ungar
João Sedoc
107
50
0
25 May 2022
InstructDial: Improving Zero and Few-shot Generalization in Dialogue
  through Instruction Tuning
InstructDial: Improving Zero and Few-shot Generalization in Dialogue through Instruction Tuning
Prakhar Gupta
Cathy Jiao
Yi-Ting Yeh
Shikib Mehri
M. Eskénazi
Jeffrey P. Bigham
ALM
119
48
0
25 May 2022
MDD-Eval: Self-Training on Augmented Data for Multi-Domain Dialogue
  Evaluation
MDD-Eval: Self-Training on Augmented Data for Multi-Domain Dialogue Evaluation
Chen Zhang
L. F. D’Haro
Thomas Friedrichs
Haizhou Li
ELM
79
19
0
14 Dec 2021
Automatic Evaluation and Moderation of Open-domain Dialogue Systems
Automatic Evaluation and Moderation of Open-domain Dialogue Systems
Chen Zhang
João Sedoc
L. F. D’Haro
Rafael E. Banchs
Alexander I. Rudnicky
78
38
0
03 Nov 2021
Investigating the Impact of Pre-trained Language Models on Dialog
  Evaluation
Investigating the Impact of Pre-trained Language Models on Dialog Evaluation
Chen Zhang
L. F. D’Haro
Yiming Chen
Thomas Friedrichs
Haizhou Li
63
5
0
05 Oct 2021
Identifying Untrustworthy Samples: Data Filtering for Open-domain
  Dialogues with Bayesian Optimization
Identifying Untrustworthy Samples: Data Filtering for Open-domain Dialogues with Bayesian Optimization
Lei Shen
Haolan Zhan
Xin Shen
Hongshen Chen
Xiaofang Zhao
Xiao-Dan Zhu
83
17
0
14 Sep 2021
How to Evaluate Your Dialogue Models: A Review of Approaches
How to Evaluate Your Dialogue Models: A Review of Approaches
Xinmeng Li
Wansen Wu
Long Qin
Quanjun Yin
ELM
58
8
0
03 Aug 2021
Synthesizing Adversarial Negative Responses for Robust Response Ranking
  and Evaluation
Synthesizing Adversarial Negative Responses for Robust Response Ranking and Evaluation
Prakhar Gupta
Yulia Tsvetkov
Jeffrey P. Bigham
86
23
0
10 Jun 2021
Recent Advances in Deep Learning Based Dialogue Systems: A Systematic
  Survey
Recent Advances in Deep Learning Based Dialogue Systems: A Systematic Survey
Jinjie Ni
Tom Young
Vlad Pandelea
Fuzhao Xue
Min Zhang
225
280
0
10 May 2021
Assessing Dialogue Systems with Distribution Distances
Assessing Dialogue Systems with Distribution Distances
Jiannan Xiang
Yahui Liu
Deng Cai
Huayang Li
Defu Lian
Lemao Liu
75
18
0
06 May 2021
$Q^{2}$: Evaluating Factual Consistency in Knowledge-Grounded Dialogues
  via Question Generation and Question Answering
Q2Q^{2}Q2: Evaluating Factual Consistency in Knowledge-Grounded Dialogues via Question Generation and Question Answering
Or Honovich
Leshem Choshen
Roee Aharoni
Ella Neeman
Idan Szpektor
Omri Abend
HILM
104
143
0
16 Apr 2021
Multi-Referenced Training for Dialogue Response Generation
Multi-Referenced Training for Dialogue Response Generation
Tianyu Zhao
Tatsuya Kawahara
SyDa
67
9
0
15 Sep 2020
Dialogue-adaptive Language Model Pre-training From Quality Estimation
Dialogue-adaptive Language Model Pre-training From Quality Estimation
Junlong Li
Zhuosheng Zhang
Hai Zhao
OffRL
68
12
0
10 Sep 2020
A Survey of Evaluation Metrics Used for NLG Systems
A Survey of Evaluation Metrics Used for NLG Systems
Ananya B. Sai
Akash Kumar Mohankumar
Mitesh M. Khapra
ELM
99
237
0
27 Aug 2020
1