Conversational assistants are becoming more and more popular, including in healthcare, partly because of the availability and capabilities of Large Language Models. There is a need for controlled, probing evaluations with real stakeholders which can highlight advantages and disadvantages of more traditional architectures and those based on generative AI. We present a within-group user study to compare two versions of a conversational assistant that allows heart failure patients to ask about salt content in food. One version of the system was developed in-house with a neurosymbolic architecture, and one is based on ChatGPT. The evaluation shows that the in-house system is more accurate, completes more tasks and is less verbose than the one based on ChatGPT; on the other hand, the one based on ChatGPT makes fewer speech errors and requires fewer clarifications to complete the task. Patients show no preference for one over the other.
View on arXiv@article{tayal2025_2504.17753, title={ Conversational Assistants to support Heart Failure Patients: comparing a Neurosymbolic Architecture with ChatGPT }, author={ Anuja Tayal and Devika Salunke and Barbara Di Eugenio and Paula Allen-Meares and Eulalia Puig Abril and Olga Garcia and Carolyn Dickens and Andrew Boyd }, journal={arXiv preprint arXiv:2504.17753}, year={ 2025 } }