ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1603.08023
  4. Cited By
How NOT To Evaluate Your Dialogue System: An Empirical Study of
  Unsupervised Evaluation Metrics for Dialogue Response Generation

How NOT To Evaluate Your Dialogue System: An Empirical Study of Unsupervised Evaluation Metrics for Dialogue Response Generation

25 March 2016
Chia-Wei Liu
Ryan J. Lowe
Iulian Serban
Michael Noseworthy
Laurent Charlin
Joelle Pineau
ArXivPDFHTML

Papers citing "How NOT To Evaluate Your Dialogue System: An Empirical Study of Unsupervised Evaluation Metrics for Dialogue Response Generation"

50 / 220 papers shown
Title
Enhancing Code Generation via Bidirectional Comment-Level Mutual Grounding
Enhancing Code Generation via Bidirectional Comment-Level Mutual Grounding
Yifeng Di
Tianyi Zhang
26
0
0
12 May 2025
BoK: Introducing Bag-of-Keywords Loss for Interpretable Dialogue Response Generation
BoK: Introducing Bag-of-Keywords Loss for Interpretable Dialogue Response Generation
Suvodip Dey
M. Desarkar
OffRL
41
0
0
20 Jan 2025
Measuring the Robustness of Reference-Free Dialogue Evaluation Systems
Measuring the Robustness of Reference-Free Dialogue Evaluation Systems
Justin Vasselli
Adam Nohejl
Taro Watanabe
AAML
49
0
0
12 Jan 2025
AutoSAM: Towards Automatic Sampling of User Behaviors for Sequential Recommender Systems
H. Zhang
Mingyue Cheng
Qi Liu
Ziqiang Liu
Junzhe Jiang
Enhong Chen
AI4TS
46
3
0
03 Jan 2025
LLM-Rubric: A Multidimensional, Calibrated Approach to Automated Evaluation of Natural Language Texts
Helia Hashemi
J. Eisner
Corby Rosset
Benjamin Van Durme
Chris Kedzie
68
1
0
03 Jan 2025
From Generation to Judgment: Opportunities and Challenges of LLM-as-a-judge
From Generation to Judgment: Opportunities and Challenges of LLM-as-a-judge
Dawei Li
Bohan Jiang
Liangjie Huang
Alimohammad Beigi
Chengshuai Zhao
...
Canyu Chen
Tianhao Wu
Kai Shu
Lu Cheng
Huan Liu
ELM
AILaw
120
67
0
25 Nov 2024
What is the Role of Small Models in the LLM Era: A Survey
What is the Role of Small Models in the LLM Era: A Survey
Lihu Chen
Gaël Varoquaux
ALM
63
23
0
10 Sep 2024
ECoh: Turn-level Coherence Evaluation for Multilingual Dialogues
ECoh: Turn-level Coherence Evaluation for Multilingual Dialogues
John Mendonça
Isabel Trancoso
A. Lavie
34
3
0
16 Jul 2024
Leveraging LLMs for Dialogue Quality Measurement
Leveraging LLMs for Dialogue Quality Measurement
Jinghan Jia
A. Komma
Timothy Leffel
Xujun Peng
Ajay Nagesh
Tamer Soliman
Aram Galstyan
Anoop Kumar
34
5
0
25 Jun 2024
Stratified Prediction-Powered Inference for Hybrid Language Model
  Evaluation
Stratified Prediction-Powered Inference for Hybrid Language Model Evaluation
Adam Fisch
Joshua Maynez
R. A. Hofer
Bhuwan Dhingra
Amir Globerson
William W. Cohen
41
8
0
06 Jun 2024
Hallucination-Free? Assessing the Reliability of Leading AI Legal
  Research Tools
Hallucination-Free? Assessing the Reliability of Leading AI Legal Research Tools
Varun Magesh
Faiz Surani
Matthew Dahl
Mirac Suzgun
Christopher D. Manning
Daniel E. Ho
HILM
ELM
AILaw
27
66
0
30 May 2024
Apollonion: Profile-centric Dialog Agent
Apollonion: Profile-centric Dialog Agent
Shangyu Chen
Zibo Zhao
Yuanyuan Zhao
Xiang Li
LLMAG
40
1
0
10 Apr 2024
A Survey of Personality, Persona, and Profile in Conversational Agents
  and Chatbots
A Survey of Personality, Persona, and Profile in Conversational Agents and Chatbots
Richard Sutcliffe
30
3
0
31 Dec 2023
Partially Randomizing Transformer Weights for Dialogue Response
  Diversity
Partially Randomizing Transformer Weights for Dialogue Response Diversity
Jing Yang Lee
Kong Aik Lee
Woon-Seng Gan
23
0
0
18 Nov 2023
Learning Personalized Alignment for Evaluating Open-ended Text
  Generation
Learning Personalized Alignment for Evaluating Open-ended Text Generation
Danqing Wang
Kevin Kaichuang Yang
Hanlin Zhu
Xiaomeng Yang
Andrew Cohen
Lei Li
Yuandong Tian
ALM
LM&MA
17
8
0
05 Oct 2023
Recursively Summarizing Enables Long-Term Dialogue Memory in Large Language Models
Recursively Summarizing Enables Long-Term Dialogue Memory in Large Language Models
Qingyue Wang
Y. Fu
Yanan Cao
Zhiliang Tian
Shi Wang
Dacheng Tao
LLMAG
KELM
RALM
59
24
0
29 Aug 2023
Three Ways of Using Large Language Models to Evaluate Chat
Three Ways of Using Large Language Models to Evaluate Chat
Ondvrej Plátek
Vojtvech Hudevcek
Patrícia Schmidtová
Mateusz Lango
Ondrej Dusek
ALM
19
6
0
12 Aug 2023
f-Divergence Minimization for Sequence-Level Knowledge Distillation
f-Divergence Minimization for Sequence-Level Knowledge Distillation
Yuqiao Wen
Zichao Li
Wenyu Du
Lili Mou
30
53
0
27 Jul 2023
Schema-Guided User Satisfaction Modeling for Task-Oriented Dialogues
Schema-Guided User Satisfaction Modeling for Task-Oriented Dialogues
Yue Feng
Yunlong Jiao
Animesh Prasad
Nikolaos Aletras
Emine Yilmaz
G. Kazai
22
5
0
26 May 2023
Psychological Metrics for Dialog System Evaluation
Psychological Metrics for Dialog System Evaluation
Salvatore Giorgi
Shreya Havaldar
Farhan S. Ahmed
Zuhaib Akhtar
Shalaka Vaidya
Gary Pan
Pallavi V. Kulkarni
H. A. Schwartz
Joao Sedoc
22
2
0
24 May 2023
Dialogue Games for Benchmarking Language Understanding: Motivation,
  Taxonomy, Strategy
Dialogue Games for Benchmarking Language Understanding: Motivation, Taxonomy, Strategy
David Schlangen
ELM
24
13
0
14 Apr 2023
CTRLStruct: Dialogue Structure Learning for Open-Domain Response
  Generation
CTRLStruct: Dialogue Structure Learning for Open-Domain Response Generation
Congchi Yin
Pijian Li
Z. Ren
31
11
0
02 Mar 2023
Improving Open-Domain Dialogue Evaluation with a Causal Inference Model
Improving Open-Domain Dialogue Evaluation with a Causal Inference Model
Cat P. Le
Luke Dai
Michael Johnston
Yang Liu
M. Walker
R. Ghanadan
ELM
19
10
0
31 Jan 2023
Improving a sequence-to-sequence nlp model using a reinforcement
  learning policy algorithm
Improving a sequence-to-sequence nlp model using a reinforcement learning policy algorithm
Jabri Ismail
Aboulbichr Ahmed
El ouaazizi Aziza
16
2
0
28 Dec 2022
CausalDialogue: Modeling Utterance-level Causality in Conversations
CausalDialogue: Modeling Utterance-level Causality in Conversations
Yi-Lin Tuan
Alon Albalak
Wenda Xu
Michael Stephen Saxon
Connor Pryor
Lise Getoor
William Yang Wang
CML
29
2
0
20 Dec 2022
Evaluating Human-Language Model Interaction
Evaluating Human-Language Model Interaction
Mina Lee
Megha Srivastava
Amelia Hardy
John Thickstun
Esin Durmus
...
Hancheng Cao
Tony Lee
Rishi Bommasani
Michael S. Bernstein
Percy Liang
LM&MA
ALM
58
98
0
19 Dec 2022
PAL: Persona-Augmented Emotional Support Conversation Generation
PAL: Persona-Augmented Emotional Support Conversation Generation
Jiale Cheng
Sahand Sabour
Hao Sun
Zhuang Chen
Minlie Huang
19
27
0
19 Dec 2022
PVGRU: Generating Diverse and Relevant Dialogue Responses via
  Pseudo-Variational Mechanism
PVGRU: Generating Diverse and Relevant Dialogue Responses via Pseudo-Variational Mechanism
Yongkang Liu
Shi Feng
Daling Wang
Yifei Zhang
Hinrich Schütze
28
6
0
18 Dec 2022
PoE: a Panel of Experts for Generalized Automatic Dialogue Assessment
PoE: a Panel of Experts for Generalized Automatic Dialogue Assessment
Chen Zhang
L. F. D’Haro
Qiquan Zhang
Thomas Friedrichs
Haizhou Li
26
7
0
18 Dec 2022
A Survey on Natural Language Processing for Programming
A Survey on Natural Language Processing for Programming
Qingfu Zhu
Xianzhen Luo
Fang Liu
Cuiyun Gao
Wanxiang Che
23
1
0
12 Dec 2022
Open-world Story Generation with Structured Knowledge Enhancement: A
  Comprehensive Survey
Open-world Story Generation with Structured Knowledge Enhancement: A Comprehensive Survey
Yuxin Wang
Jieru Lin
Zhiwei Yu
Wei Hu
Börje F. Karlsson
20
17
0
09 Dec 2022
Deep Fake Detection, Deterrence and Response: Challenges and
  Opportunities
Deep Fake Detection, Deterrence and Response: Challenges and Opportunities
Amin Azmoodeh
Ali Dehghantanha
42
2
0
26 Nov 2022
CDialog: A Multi-turn Covid-19 Conversation Dataset for Entity-Aware
  Dialog Generation
CDialog: A Multi-turn Covid-19 Conversation Dataset for Entity-Aware Dialog Generation
Deeksha Varshney
Aizan Zafar
Niranshu Kumar Behra
Asif Ekbal
21
6
0
16 Nov 2022
Multi-VQG: Generating Engaging Questions for Multiple Images
Multi-VQG: Generating Engaging Questions for Multiple Images
Min-Hsuan Yeh
Vicent Chen
Ting-Hao Haung
Lun-Wei Ku
CoGe
18
7
0
14 Nov 2022
Empathetic Dialogue Generation via Sensitive Emotion Recognition and
  Sensible Knowledge Selection
Empathetic Dialogue Generation via Sensitive Emotion Recognition and Sensible Knowledge Selection
Lanrui Wang
JiangNan Li
Zheng Lin
Fandong Meng
Chenxu Yang
Weiping Wang
Jie Zhou
18
30
0
21 Oct 2022
Controllable Fake Document Infilling for Cyber Deception
Controllable Fake Document Infilling for Cyber Deception
Yibo Hu
Yu Lin
Eric Parolin
Latif Khan
Kevin W. Hamlen
32
8
0
18 Oct 2022
Dialogue Evaluation with Offline Reinforcement Learning
Dialogue Evaluation with Offline Reinforcement Learning
Nurul Lubis
Christian Geishauser
Hsien-Chin Lin
Carel van Niekerk
Michael Heck
Shutong Feng
Milica Gavsić
OffRL
19
4
0
02 Sep 2022
Towards Boosting the Open-Domain Chatbot with Human Feedback
Towards Boosting the Open-Domain Chatbot with Human Feedback
Hua Lu
Siqi Bao
H. He
Fan Wang
Hua-Hong Wu
Haifeng Wang
ALM
20
18
0
30 Aug 2022
Of Human Criteria and Automatic Metrics: A Benchmark of the Evaluation
  of Story Generation
Of Human Criteria and Automatic Metrics: A Benchmark of the Evaluation of Story Generation
Cyril Chhun
Pierre Colombo
Chloé Clavel
Fabian M. Suchanek
53
50
0
24 Aug 2022
SelF-Eval: Self-supervised Fine-grained Dialogue Evaluation
SelF-Eval: Self-supervised Fine-grained Dialogue Evaluation
Longxuan Ma
Ziyu Zhuang
Weinan Zhang
Mingda Li
Ting Liu
26
4
0
17 Aug 2022
Why is constrained neural language generation particularly challenging?
Why is constrained neural language generation particularly challenging?
Cristina Garbacea
Qiaozhu Mei
59
14
0
11 Jun 2022
On Reinforcement Learning and Distribution Matching for Fine-Tuning
  Language Models with no Catastrophic Forgetting
On Reinforcement Learning and Distribution Matching for Fine-Tuning Language Models with no Catastrophic Forgetting
Tomasz Korbak
Hady ElSahar
Germán Kruszewski
Marc Dymetman
CLL
19
50
0
01 Jun 2022
Commonsense and Named Entity Aware Knowledge Grounded Dialogue
  Generation
Commonsense and Named Entity Aware Knowledge Grounded Dialogue Generation
Deeksha Varshney
Akshara Prabhakar
Asif Ekbal
27
18
0
27 May 2022
A Question-Answer Driven Approach to Reveal Affirmative Interpretations
  from Verbal Negations
A Question-Answer Driven Approach to Reveal Affirmative Interpretations from Verbal Negations
Md Mosharaf Hossain
L. Holman
Anusha Kakileti
T. Kao
N. Brito
A. Mathews
Eduardo Blanco
26
3
0
23 May 2022
Computational Storytelling and Emotions: A Survey
Computational Storytelling and Emotions: A Survey
Yusuke Mori
Hiroaki Yamane
Yusuke Mukuta
Tatsuya Harada
35
2
0
23 May 2022
CORAL: Contextual Response Retrievability Loss Function for Training
  Dialog Generation Models
CORAL: Contextual Response Retrievability Loss Function for Training Dialog Generation Models
Bishal Santra
Ravi Ghadia
Manish Gupta
Pawan Goyal
OffRL
20
0
0
21 May 2022
Target-Guided Dialogue Response Generation Using Commonsense and Data
  Augmentation
Target-Guided Dialogue Response Generation Using Commonsense and Data Augmentation
Prakhar Gupta
Harsh Jhamtani
Jeffrey P. Bigham
46
12
0
19 May 2022
Near-Negative Distinction: Giving a Second Life to Human Evaluation
  Datasets
Near-Negative Distinction: Giving a Second Life to Human Evaluation Datasets
Philippe Laban
Chien-Sheng Wu
Wenhao Liu
Caiming Xiong
38
5
0
13 May 2022
Vector Representations of Idioms in Conversational Systems
Vector Representations of Idioms in Conversational Systems
Tosin P. Adewumi
F. Liwicki
Marcus Liwicki
30
8
0
07 May 2022
Balancing Multi-Domain Corpora Learning for Open-Domain Response
  Generation
Balancing Multi-Domain Corpora Learning for Open-Domain Response Generation
Yujie Xing
Jason (Jinglun) Cai
Nils Barlaug
Peng Liu
J. Gulla
29
4
0
05 May 2022
12345
Next