Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
2010.02140
Cited By
Spot The Bot: A Robust and Efficient Framework for the Evaluation of Conversational Dialogue Systems
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2020
5 October 2020
Jan Deriu
Don Tuggener
Pius von Daniken
Jon Ander Campos
Álvaro Rodrigo
Thiziri Belkacem
Aitor Soroa Etxabe
Eneko Agirre
Mark Cieliebak
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Spot The Bot: A Robust and Efficient Framework for the Evaluation of Conversational Dialogue Systems"
27 / 27 papers shown
HumAIne-Chatbot: Real-Time Personalized Conversational AI via Reinforcement Learning
Georgios Makridis
Georgios Fragiadakis
Jorge Oliveira
Tomaz Saraiva
Philip Mavrepis
G. Fatouros
D. Kyriazis
OffRL
107
0
0
04 Sep 2025
Is Our Chatbot Telling Lies? Assessing Correctness of an LLM-based Dutch Support Chatbot
Herman Lassche
Michiel Overeem
Ayushi Rastogi
307
0
0
29 Oct 2024
DiverseDialogue: A Methodology for Designing Chatbots with Human-Like Diversity
Xiaoyu Lin
Xinkai Yu
Ankit Aich
Salvatore Giorgi
Lyle Ungar
ALM
158
3
0
30 Aug 2024
Favi-Score: A Measure for Favoritism in Automated Preference Ratings for Generative AI Evaluation
Pius von Daniken
Jan Deriu
Don Tuggener
Mark Cieliebak
232
2
0
03 Jun 2024
DiQAD: A Benchmark Dataset for End-to-End Open-domain Dialogue Assessment
Yukun Zhao
Lingyong Yan
Weiwei Sun
Chong Meng
Shuaiqiang Wang
Zhicong Cheng
Zhaochun Ren
D. Yin
ELM
151
0
0
25 Oct 2023
Which Prompts Make The Difference? Data Prioritization For Efficient Human LLM Evaluation
M. Boubdir
Edward Kim
Beyza Ermis
Marzieh Fadaee
Sara Hooker
ALM
283
21
0
22 Oct 2023
Psychological Metrics for Dialog System Evaluation
Salvatore Giorgi
Shreya Havaldar
Farhan S. Ahmed
Zuhaib Akhtar
Shalaka Vaidya
Gary Pan
Pallavi V. Kulkarni
H. Andrew Schwartz
Joao Sedoc
375
6
0
24 May 2023
Approximating Online Human Evaluation of Social Chatbots with Prompting
SIGDIAL Conferences (SIGDIAL), 2023
Ekaterina Svikhnushina
Pearl Pu
ELM
254
14
0
11 Apr 2023
Rewarding Chatbots for Real-World Engagement with Millions of Users
R. Irvine
D. Boubert
Vyas Raina
Adian Liusie
Ziyi Zhu
...
Valentin Assassi
Christie-Carol Beauchamp
Xiaoding Lu
Thomas Rialan
W. Beauchamp
ALM
231
58
0
10 Mar 2023
Real or Fake Text?: Investigating Human Ability to Detect Boundaries Between Human-Written and Machine-Generated Text
AAAI Conference on Artificial Intelligence (AAAI), 2022
Liam Dugan
Daphne Ippolito
Arun Kirubarajan
Sherry Shi
Chris Callison-Burch
DeLMO
273
98
0
24 Dec 2022
Evaluating Human-Language Model Interaction
Mina Lee
Megha Srivastava
Amelia Hardy
John Thickstun
Esin Durmus
...
Hancheng Cao
Tony Lee
Rishi Bommasani
Michael S. Bernstein
Abigail Z. Jacobs
LM&MA
ALM
304
119
0
19 Dec 2022
Don't Forget Your ABC's: Evaluating the State-of-the-Art in Chat-Oriented Dialogue Systems
Annual Meeting of the Association for Computational Linguistics (ACL), 2022
Sarah E. Finch
James D. Finch
Jinho Choi
275
15
0
18 Dec 2022
Bipartite-play Dialogue Collection for Practical Automatic Evaluation of Dialogue Systems
Shiki Sato
Yosuke Kishinami
Hiroaki Sugiyama
Reina Akama
Ryoko Tokuhisa
Jun Suzuki
269
2
0
19 Nov 2022
On the Effectiveness of Automated Metrics for Text Generation Systems
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2022
Pius von Daniken
Jan Deriu
Don Tuggener
Mark Cieliebak
235
3
0
24 Oct 2022
State-of-the-art in Open-domain Conversational AI: A Survey
Tosin Adewumi
F. Liwicki
Marcus Liwicki
312
18
0
02 May 2022
Towards Robust Online Dialogue Response Generation
Leyang Cui
Fandong Meng
Yanjun Liu
Jie Zhou
Yue Zhang
153
1
0
07 Mar 2022
Recent Advances in Neural Text Generation: A Task-Agnostic Survey
Chen Tang
Frank Guerin
Chenghua Lin
AI4CE
OOD
362
20
0
06 Mar 2022
Human Evaluation of Conversations is an Open Problem: comparing the sensitivity of various methods for evaluating dialogue agents
Eric Michael Smith
Orion Hsu
Rebecca Qian
Stephen Roller
Y-Lan Boureau
Jason Weston
224
78
0
12 Jan 2022
Better than Average: Paired Evaluation of NLP Systems
Annual Meeting of the Association for Computational Linguistics (ACL), 2021
Maxime Peyrard
Wei Zhao
Steffen Eger
Robert West
ELM
246
32
0
20 Oct 2021
Anticipating Safety Issues in E2E Conversational AI: Framework and Tooling
Emily Dinan
Gavin Abercrombie
A. S. Bergman
Shannon L. Spruit
Dirk Hovy
Y-Lan Boureau
Verena Rieser
333
115
0
07 Jul 2021
A Comprehensive Assessment of Dialog Evaluation Metrics
Yi-Ting Yeh
M. Eskénazi
Shikib Mehri
291
116
0
07 Jun 2021
Addressing Inquiries about History: An Efficient and Practical Framework for Evaluating Open-domain Chatbot Consistency
Findings (Findings), 2021
Zekang Li
Jinchao Zhang
Zhengcong Fei
Yang Feng
Jie Zhou
108
15
0
04 Jun 2021
DynaEval: Unifying Turn and Dialogue Level Evaluation
Annual Meeting of the Association for Computational Linguistics (ACL), 2021
Chen Zhang
Yiming Chen
L. F. D’Haro
Yan Zhang
Thomas Friedrichs
Grandee Lee
Haizhou Li
182
78
0
02 Jun 2021
Towards Standard Criteria for human evaluation of Chatbots: A Survey
Hongru Liang
Huaqing Li
162
16
0
24 May 2021
Recent Advances in Deep Learning Based Dialogue Systems: A Systematic Survey
Artificial Intelligence Review (AIR), 2021
Jinjie Ni
Tom Young
Vlad Pandelea
Fuzhao Xue
Xiaoshi Zhong
837
322
0
10 May 2021
Towards Automated Psychotherapy via Language Modeling
Houjun Liu
AI4MH
184
4
0
05 Apr 2021
Measuring the `I don't know' Problem through the Lens of Gricean Quantity
North American Chapter of the Association for Computational Linguistics (NAACL), 2020
Huda Khayrallah
João Sedoc
240
4
0
24 Oct 2020
1