831

UFO2: The Desktop AgentOS

Main:22 Pages
28 Figures
Bibliography:2 Pages
9 Tables
Abstract

Recent Computer-Using Agents (CUAs), powered by multimodal large language models (LLMs), offer a promising direction for automating complex desktop workflows through natural language. However, most existing CUAs remain conceptual prototypes, hindered by shallow OS integration, fragile screenshot-based interaction, and disruptive execution.

View on arXiv
Comments on this paper