UFO2: The Desktop AgentOS
- LLMAG
Main:22 Pages
28 Figures
Bibliography:2 Pages
9 Tables
Abstract
Recent Computer-Using Agents (CUAs), powered by multimodal large language models (LLMs), offer a promising direction for automating complex desktop workflows through natural language. However, most existing CUAs remain conceptual prototypes, hindered by shallow OS integration, fragile screenshot-based interaction, and disruptive execution.
View on arXivComments on this paper
