A Multimodal GUI Architecture for Interfacing with LLM-Based Conversational Assistants

Main:20 Pages
16 Figures
Bibliography:4 Pages
2 Tables
Abstract
Advances in large language models (LLMs) and real-time speech recognition now make it possible to issue any graphical user interface (GUI) action through natural language and receive the corresponding system response directly through the GUI. Most production applications were never designed with speech in mind. This article provides a concrete architecture that enables GUIs to interface with LLM-based speech-enabled assistants.
View on arXivComments on this paper
