ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2509.23738
80
0

GUI-Shepherd: Reliable Process Reward and Verification for Long-Sequence GUI Tasks

28 September 2025
Cong Chen
Kaixiang Ji
Hao Zhong
Huanyi Zheng
Anzhou Li
Guo Gan
Ziyuan Huang
Cheng Zou
Jiajia Liu
Jingdong Chen
Hao Chen
Chunhua Shen
    ALM
ArXiv (abs)PDFHTML
Main:9 Pages
9 Figures
Bibliography:4 Pages
6 Tables
Appendix:9 Pages
Abstract

Autonomous agents for long-sequence Graphical User Interface tasks are hindered by sparse rewards and the intractable credit assignment problem. To address these challenges, we introduce GUI-Shepherd, a Process Reward Model that provides dense, step-by-step feedback to guide agents. GUI-Shepherd is trained on a diverse large-scale data set of 525252k interactions that features human-annotated scores and GPT-4o generated rationales, enabling it to serve both as a reward provider for RL training and as a verifier for inference. As far as we know, we are the first to conduct a systematic study of process supervision in GUI agents, across diverse settings from online long-horizon tasks to offline single-step prediction. On the online AndroidWorld benchmark, GUI-Shepherd improves success rate by 7.77.77.7 points via multi-turn online PPO, significantly outperforming Outcome Reward Model based competitors. When used as an inference verifier, it brings 5.15.15.1 points improvements. The benefits generalize to the offline AndroidControl benchmark, with gains of 2.22.22.2 points as a reward provider and 4.34.34.3 points as a verifier. Collectively, our results establish that high-fidelity process supervision is critical for building more capable GUI agents and present a generalizable solution.

View on arXiv
Comments on this paper