v1v2 (latest)

UI-Venus-1.5 Technical Report

9 February 2026

Venus Team

Changlong Gao

Zhangxuan Gu

Yulin Liu

Xinyu Qiu

Shuheng Shen

Yue Wen

Tianyu Xia

Zhenyu Xu

Zhengwen Zeng

Beitong Zhou

Xingran Zhou

Weizhi Chen

Sunhao Dai

Jingya Dou

Yichen Gong

Yuan Guo

Zhenlin Guo

Feng Li

Qian Li

Jinzhen Lin

Yuqi Zhou

Linchao Zhu

Liang Chen

Zhenyu Guo

Changhua Meng

Weiqiang Wang

LLMAG

LM&Ro

ArXiv (abs)PDF HTML HuggingFace (149 upvotes)Github (1146★)

Main:22 Pages

9 Figures

Bibliography:5 Pages

13 Tables

Appendix:8 Pages

Abstract

GUI agents have emerged as a powerful paradigm for automating interactions in digital environments, yet achieving both broad generality and consistently strong task performance remains challenging. In this report, we present UI-Venus-1.5, a unified, end-to-end GUI Agent designed for robust real-world applications. The proposed model family comprises two dense variants (2B and 8B) and one mixture-of-experts variant (30B-A3B) to meet various downstream application scenarios. Compared to our previous version, UI-Venus-1.5 introduces three key technical advances: (1) a comprehensive Mid-Training stage leveraging 10 billion tokens across 30+ datasets to establish foundational GUI semantics; (2) Online Reinforcement Learning with full-trajectory rollouts, aligning training objectives with long-horizon, dynamic navigation in large-scale environments; and (3) a single unified GUI Agent constructed via Model Merging, which synthesizes domain-specific models (grounding, web, and mobile) into one cohesive checkpoint. Extensive evaluations demonstrate that UI-Venus-1.5 establishes new state-of-the-art performance on benchmarks such as ScreenSpot-Pro (69.6%), VenusBench-GD (75.0%), and AndroidWorld (77.6%), significantly outperforming previous strong baselines. In addition, UI-Venus-1.5 demonstrates robust navigation capabilities across a variety of Chinese mobile apps, effectively executing user instructions in real-world scenarios. Code:this https URLModel:this https URL

View on arXiv

Comments on this paper