ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2503.20978
44
0

ScreenLLM: Stateful Screen Schema for Efficient Action Understanding and Prediction

26 March 2025
Yiqiao Jin
Stefano Petrangeli
Yu Shen
Gang Wu
    LLMAG
    LM&Ro
ArXivPDFHTML
Abstract

Graphical User Interface (GUI) agents are autonomous systems that interpret and generate actions, enabling intelligent user assistance and automation. Effective training of these agent presents unique challenges, such as sparsity in supervision signals, scalability for large datasets, and the need for nuanced user understanding. We propose stateful screen schema, an efficient representation of GUI interactions that captures key user actions and intentions over time. Building on this foundation, we introduce ScreenLLM, a set of multimodal large language models (MLLMs) tailored for advanced UI understanding and action prediction. Extensive experiments on both open-source and proprietary models show that ScreenLLM accurately models user behavior and predicts actions. Our work lays the foundation for scalable, robust, and intelligent GUI agents that enhance user interaction in diverse software environments.

View on arXiv
@article{jin2025_2503.20978,
  title={ ScreenLLM: Stateful Screen Schema for Efficient Action Understanding and Prediction },
  author={ Yiqiao Jin and Stefano Petrangeli and Yu Shen and Gang Wu },
  journal={arXiv preprint arXiv:2503.20978},
  year={ 2025 }
}
Comments on this paper