80
v1v2 (latest)

Fast Multi-Party Open-Ended Conversation with a Social Robot

Giulio Antonio Abbo
Maria Jose Pinto-Bernal
Martijn Catrycke
Tony Belpaeme
Main:14 Pages
6 Figures
Bibliography:4 Pages
4 Tables
Appendix:8 Pages
Abstract

Multi-party open-ended conversation remains a major challenge in human-robot interaction, particularly when robots must recognise speakers, allocate turns, and respond coherently under overlapping or rapidly shifting dialogue. This paper presents a multi-party conversational system that combines multimodal perception (voice direction of arrival, speaker diarisation, face recognition) with a large language model for response generation. Implemented on the Furhat robot, the system was evaluated with 30 participants across two scenarios: (i) parallel, separate conversations and (ii) shared group discussion. Results show that the system maintains coherent and engaging conversations, achieving high addressee accuracy in parallel settings (92.6%) and strong face recognition reliability (80-94%). Participants reported clear social presence and positive engagement, although technical barriers such as audio-based speaker recognition errors and response latency affected the fluidity of group interactions. The results highlight both the promise and limitations of LLM-based multi-party interaction and outline concrete directions for improving multimodal cue integration and responsiveness in future social robots.

View on arXiv
Comments on this paper