v1v2 (latest)

Fast Multi-Party Open-Ended Conversation with a Social Robot

15 January 2025

Giulio Antonio Abbo

Maria Jose Pinto-Bernal

Martijn Catrycke

Tony Belpaeme

ArXiv (abs)PDF HTML Github (55890★)

Main:14 Pages

6 Figures

Bibliography:4 Pages

4 Tables

Appendix:8 Pages

Abstract

Multi-party open-ended conversation remains a major challenge in human-robot interaction, particularly when robots must recognise speakers, allocate turns, and respond coherently under overlapping or rapidly shifting dialogue. This paper presents a multi-party conversational system that combines multimodal perception (voice direction of arrival, speaker diarisation, face recognition) with a large language model for response generation. Implemented on the Furhat robot, the system was evaluated with 30 participants across two scenarios: (i) parallel, separate conversations and (ii) shared group discussion. Results show that the system maintains coherent and engaging conversations, achieving high addressee accuracy in parallel settings (92.6%) and strong face recognition reliability (80-94%). Participants reported clear social presence and positive engagement, although technical barriers such as audio-based speaker recognition errors and response latency affected the fluidity of group interactions. The results highlight both the promise and limitations of LLM-based multi-party interaction and outline concrete directions for improving multimodal cue integration and responsiveness in future social robots.

View on arXiv

Comments on this paper