ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2510.19949
123
0
v1v2 (latest)

Surfer 2: The Next Generation of Cross-Platform Computer Use Agents

22 October 2025
M. Andreux
Märt Bakler
Yanael Barbier
Hamza Ben Chekroun
Emilien Biré
Antoine Bonnet
Riaz Bordie
Nathan Bout
Matthias Brunel
Aleix Cambray
Pierre-Louis Cedoz
Antoine Chassang
Gautier Cloix
Ethan Connelly
Alexandra D. Constantinou
Ramzi De Coster
Hubert de La Jonquière
Aurélien Delfosse
Maxime Delpit
Alexis Deprez
Augustin Derupti
Mathieu Diaz
Shannon D'Souza
Julie Dujardin
Abai Edmund
Michael Eickenberg
Armand Fatalot
Wissem Felissi
Isaac Herring
Xavier Koegler
Erwan Le Jumeau de Kergaradec
Aurélien Lac
Maxime Langevin
Corentin Lauverjat
Antonio Loison
Avshalom Manevich
Axel Moyal
Axel Nguyen Kerbel
Marinela Parovic
Julien Revelle
Guillaume Richard
Mats L. Richter
Ronan Riochet
María Santos
Romain Savidan
Laurent Sifre
Maxime Theillard
Marc Thibault
Ivan Valentini
Tony Wu
Laura Yie
Kai Yuan
Jevgenij Zubovskij
    LLMAGLRM
ArXiv (abs)PDFHTMLHuggingFace (36 upvotes)Github (6★)
Main:6 Pages
9 Figures
2 Tables
Appendix:15 Pages
Abstract

Building agents that generalize across web, desktop, and mobile environments remains an open challenge, as prior systems rely on environment-specific interfaces that limit cross-platform deployment. We introduce Surfer 2, a unified architecture operating purely from visual observations that achieves state-of-the-art performance across all three environments. Surfer 2 integrates hierarchical context management, decoupled planning and execution, and self-verification with adaptive recovery, enabling reliable operation over long task horizons. Our system achieves 97.1% accuracy on WebVoyager, 69.6% on WebArena, 60.1% on OSWorld, and 87.1% on AndroidWorld, outperforming all prior systems without task-specific fine-tuning. With multiple attempts, Surfer 2 exceeds human performance on all benchmarks. These results demonstrate that systematic orchestration amplifies foundation model capabilities and enables general-purpose computer control through visual interaction alone, while calling for a next-generation vision language model to achieve Pareto-optimal cost-efficiency.

View on arXiv
Comments on this paper