Spot The Bot: A Robust and Efficient Framework for the Evaluation of Conversational Dialogue Systems

Conference on Empirical Methods in Natural Language Processing (EMNLP), 2020

5 October 2020

Papers citing "Spot The Bot: A Robust and Efficient Framework for the Evaluation of Conversational Dialogue Systems"

27 / 27 papers shown

HumAIne-Chatbot: Real-Time Personalized Conversational AI via Reinforcement Learning

107

04 Sep 2025

Is Our Chatbot Telling Lies? Assessing Correctness of an LLM-based Dutch Support Chatbot

Herman Lassche

Michiel Overeem

Ayushi Rastogi

307

29 Oct 2024

DiverseDialogue: A Methodology for Designing Chatbots with Human-Like Diversity

Ankit Aich

Lyle Ungar

158

30 Aug 2024

Favi-Score: A Measure for Favoritism in Automated Preference Ratings for Generative AI Evaluation

232

03 Jun 2024

DiQAD: A Benchmark Dataset for End-to-End Open-domain Dialogue Assessment

151

25 Oct 2023

Which Prompts Make The Difference? Data Prioritization For Efficient Human LLM Evaluation

283

22 Oct 2023

Psychological Metrics for Dialog System Evaluation

Joao Sedoc

375

24 May 2023

Approximating Online Human Evaluation of Social Chatbots with PromptingSIGDIAL Conferences (SIGDIAL), 2023

Ekaterina Svikhnushina

Pearl Pu

ELM

254

11 Apr 2023

Rewarding Chatbots for Real-World Engagement with Millions of Users

...

Christie-Carol Beauchamp

231

10 Mar 2023

Real or Fake Text?: Investigating Human Ability to Detect Boundaries Between Human-Written and Machine-Generated TextAAAI Conference on Artificial Intelligence (AAAI), 2022

273

24 Dec 2022

Evaluating Human-Language Model Interaction

Esin Durmus

...

304

119

19 Dec 2022

Don't Forget Your ABC's: Evaluating the State-of-the-Art in Chat-Oriented Dialogue SystemsAnnual Meeting of the Association for Computational Linguistics (ACL), 2022

Sarah E. Finch

James D. Finch

Jinho Choi

275

18 Dec 2022

Bipartite-play Dialogue Collection for Practical Automatic Evaluation of Dialogue Systems

269

19 Nov 2022

On the Effectiveness of Automated Metrics for Text Generation SystemsConference on Empirical Methods in Natural Language Processing (EMNLP), 2022

235

24 Oct 2022

State-of-the-art in Open-domain Conversational AI: A Survey

Tosin Adewumi

F. Liwicki

Marcus Liwicki

312

02 May 2022

Towards Robust Online Dialogue Response Generation

Leyang Cui

Fandong Meng

Yanjun Liu

Jie Zhou

Yue Zhang

153

07 Mar 2022

Recent Advances in Neural Text Generation: A Task-Agnostic Survey

362

06 Mar 2022

Human Evaluation of Conversations is an Open Problem: comparing the sensitivity of various methods for evaluating dialogue agents

Jason Weston

224

12 Jan 2022

Better than Average: Paired Evaluation of NLP SystemsAnnual Meeting of the Association for Computational Linguistics (ACL), 2021

246

20 Oct 2021

Anticipating Safety Issues in E2E Conversational AI: Framework and Tooling

333

115

07 Jul 2021

A Comprehensive Assessment of Dialog Evaluation Metrics

Yi-Ting Yeh

M. Eskénazi

Shikib Mehri

291

116

07 Jun 2021

Addressing Inquiries about History: An Efficient and Practical Framework for Evaluating Open-domain Chatbot ConsistencyFindings (Findings), 2021

Zekang Li

Jinchao Zhang

Zhengcong Fei

Yang Feng

Jie Zhou

108

04 Jun 2021

DynaEval: Unifying Turn and Dialogue Level EvaluationAnnual Meeting of the Association for Computational Linguistics (ACL), 2021

Chen Zhang

Yiming Chen

Haizhou Li

182

02 Jun 2021

Towards Standard Criteria for human evaluation of Chatbots: A Survey

Hongru Liang

Huaqing Li

162

24 May 2021

Recent Advances in Deep Learning Based Dialogue Systems: A Systematic SurveyArtificial Intelligence Review (AIR), 2021

837

322

10 May 2021

Towards Automated Psychotherapy via Language Modeling

Houjun Liu

AI4MH

184

05 Apr 2021

Measuring the `I don't know' Problem through the Lens of Gricean QuantityNorth American Chapter of the Association for Computational Linguistics (NAACL), 2020

Huda Khayrallah

João Sedoc

240

24 Oct 2020