ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2508.04482
230
29

OS Agents: A Survey on MLLM-based Agents for General Computing Devices Use

6 August 2025
Xueyu Hu
Tao Xiong
Biao Yi
Zishu Wei
Ruixuan Xiao
Yurun Chen
Jiasheng Ye
Meiling Tao
Xiangxin Zhou
Ziyu Zhao
Yuhuai Li
Shengze Xu
Shenzhi Wang
Xinchen Xu
Shuofei Qiao
Zhaokai Wang
Kun Kuang
Tieyong Zeng
Liang Wang
Jiwei Li
Yuchen Eleanor Jiang
Wangchunshu Zhou
Guoyin Wang
Keting Yin
Zhou Zhao
Hongxia Yang
Fan Wu
Shengyu Zhang
Fei Wu
    LLMAGLM&RoAI4TS
ArXiv (abs)PDFHTMLHuggingFace (8 upvotes)Github (178410★)
Main:27 Pages
3 Figures
Bibliography:8 Pages
3 Tables
Appendix:1 Pages
Abstract

The dream to create AI assistants as capable and versatile as the fictional J.A.R.V.I.S from Iron Man has long captivated imaginations. With the evolution of (multi-modal) large language models ((M)LLMs), this dream is closer to reality, as (M)LLM-based Agents using computing devices (e.g., computers and mobile phones) by operating within the environments and interfaces (e.g., Graphical User Interface (GUI)) provided by operating systems (OS) to automate tasks have significantly advanced. This paper presents a comprehensive survey of these advanced agents, designated as OS Agents. We begin by elucidating the fundamentals of OS Agents, exploring their key components including the environment, observation space, and action space, and outlining essential capabilities such as understanding, planning, and grounding. We then examine methodologies for constructing OS Agents, focusing on domain-specific foundation models and agent frameworks. A detailed review of evaluation protocols and benchmarks highlights how OS Agents are assessed across diverse tasks. Finally, we discuss current challenges and identify promising directions for future research, including safety and privacy, personalization and self-evolution. This survey aims to consolidate the state of OS Agents research, providing insights to guide both academic inquiry and industrial development. An open-source GitHub repository is maintained as a dynamic resource to foster further innovation in this field. We present a 9-page version of our work, accepted by ACL 2025, to provide a concise overview to the domain.

View on arXiv
Comments on this paper