14
v1v2 (latest)

UI-Venus-1.5 Technical Report

Venus Team
Changlong Gao
Zhangxuan Gu
Yulin Liu
Xinyu Qiu
Shuheng Shen
Yue Wen
Tianyu Xia
Zhenyu Xu
Zhengwen Zeng
Beitong Zhou
Xingran Zhou
Weizhi Chen
Sunhao Dai
Jingya Dou
Yichen Gong
Yuan Guo
Zhenlin Guo
Feng Li
Qian Li
Jinzhen Lin
Yuqi Zhou
Linchao Zhu
Liang Chen
Zhenyu Guo
Changhua Meng
Weiqiang Wang
Main:22 Pages
9 Figures
Bibliography:5 Pages
13 Tables
Appendix:8 Pages
Abstract

GUI agents have emerged as a powerful paradigm for automating interactions in digital environments, yet achieving both broad generality and consistently strong task performance remains challenging. In this report, we present UI-Venus-1.5, a unified, end-to-end GUI Agent designed for robust real-world applications. The proposed model family comprises two dense variants (2B and 8B) and one mixture-of-experts variant (30B-A3B) to meet various downstream application scenarios. Compared to our previous version, UI-Venus-1.5 introduces three key technical advances: (1) a comprehensive Mid-Training stage leveraging 10 billion tokens across 30+ datasets to establish foundational GUI semantics; (2) Online Reinforcement Learning with full-trajectory rollouts, aligning training objectives with long-horizon, dynamic navigation in large-scale environments; and (3) a single unified GUI Agent constructed via Model Merging, which synthesizes domain-specific models (grounding, web, and mobile) into one cohesive checkpoint. Extensive evaluations demonstrate that UI-Venus-1.5 establishes new state-of-the-art performance on benchmarks such as ScreenSpot-Pro (69.6%), VenusBench-GD (75.0%), and AndroidWorld (77.6%), significantly outperforming previous strong baselines. In addition, UI-Venus-1.5 demonstrates robust navigation capabilities across a variety of Chinese mobile apps, effectively executing user instructions in real-world scenarios. Code:this https URLModel:this https URL

View on arXiv
Comments on this paper