A Multimodal GUI Architecture for Interfacing with LLM-Based Conversational Assistants

31 August 2025

Hans G.W. van Dam

ArXiv (abs)PDF HTML Github (2★)

Main:20 Pages

16 Figures

Bibliography:4 Pages

2 Tables

Abstract

Advances in large language models (LLMs) and real-time speech recognition now make it possible to issue any graphical user interface (GUI) action through natural language and receive the corresponding system response directly through the GUI. Most production applications were never designed with speech in mind. This article provides a concrete architecture that enables GUIs to interface with LLM-based speech-enabled assistants.

View on arXiv

Comments on this paper