v1v2v3v4v5 (latest)

UFO: A UI-Focused Agent for Windows OS Interaction

8 February 2024

ArXiv (abs)PDF HTML HuggingFace (17 upvotes)Github (7307★)

Papers citing "UFO: A UI-Focused Agent for Windows OS Interaction"

50 / 83 papers shown

GUI-AIMA: Aligning Intrinsic Multimodal Attention with a Context Anchor for GUI Grounding

230

30 Mar 2026

Prune4Web: DOM Tree Pruning Programming for Web Agent

452

26 Nov 2025

WebCoach: Self-Evolving Web Agents with Cross-Session Memory Guidance

787

17 Nov 2025

An Efficient Training Pipeline for Reasoning Graphical User Interface Agents

Georgios Pantazopoulos

Eda B. Özyiğit

LRM

441

11 Nov 2025

Empowering Real-World: A Survey on the Technology, Practice, and Evaluation of LLM-driven Industry Agents

...

214

20 Oct 2025

SAG-Agent: Enabling Long-Horizon Reasoning in Strategy Games via Dynamic Knowledge Graphs

Liangli Zhen

Jiancheng Lv

LRM

236

17 Oct 2025

CORE: Reducing UI Exposure in Mobile Agents via Collaboration Between Cloud and Local LLMs

176

17 Oct 2025

OpenDerisk: An Industrial Framework for AI-Driven SRE, with Design, Implementation, and Case Studies

...

238

15 Oct 2025

vAttention: Verified Sparse Attention

Aditya Desai

Kumar Krishna Agrawal

Shuo Yang

Alejandro Cuadron

Luis Gaspar Schroeder

155

07 Oct 2025

From Imperative to Declarative: Towards LLM-friendly OS Interfaces for Boosted Computer-Use Agents

250

06 Oct 2025

LEGOMem: Modular Procedural Memory for Multi-agent LLM Systems for Workflow Automation

155

06 Oct 2025

Improving GUI Grounding with Explicit Position-to-Coordinate Mapping

124

03 Oct 2025

Agent-ScanKit: Unraveling Memory and Reasoning of Multimodal Agents via Sensitivity Perturbations

446

01 Oct 2025

Learning GUI Grounding with Spatial Reasoning from Visual Feedback

...

158

25 Sep 2025

Towards Understanding Visual Grounding in Visual Language Models

Georgios Pantazopoulos

Eda B. Özyiğit

ObjD

507

12 Sep 2025

Instruction Agent: Enhancing Agent with Expert Demonstration

143

08 Sep 2025

Mobile-Agent-v3: Fundamental Agents for GUI Automation

...

356

21 Aug 2025

MagicGUI: A Foundational Mobile GUI Agent with Scalable Data Pipeline and Reinforcement Fine-tuning

...

526

19 Jul 2025

Atomic-to-Compositional Generalization for Mobile Agents with A New Benchmark and Scheduling System

354

10 Jun 2025

Look Before You Leap: A GUI-Critic-R1 Model for Pre-Operative Error Diagnosis in GUI Automation

...

450

05 Jun 2025

macOSWorld: A Multilingual Interactive Benchmark for GUI Agents

623

04 Jun 2025

GUI-Actor: Coordinate-Free Visual Grounding for GUI Agents

...

362

03 Jun 2025

Text2Grad: Reinforcement Learning from Natural Language Feedback

333

28 May 2025

ScienceBoard: Evaluating Multimodal Autonomous Agents in Realistic Scientific Workflows

...

574

26 May 2025

ScreenExplorer: Training a Vision-Language Model for Diverse Exploration in Open GUI World

243

25 May 2025

LA-RCS: LLM-Agent-Based Robot Control System

341

23 May 2025

ProgRM: Build Better GUI Agents with Progress Rewards

294

23 May 2025

Hidden Ghost Hand: Unveiling Backdoor Vulnerabilities in MLLM-Powered Mobile GUI Agents

446

20 May 2025

Mobile-Agent-V: A Video-Guided Approach for Effortless and Efficient Operational Knowledge Injection in Mobile Automation

559

20 May 2025

From Assistants to Adversaries: Exploring the Security Risks of Mobile LLM Agents

495

19 May 2025

Can Global XAI Methods Reveal Injected Bias in LLMs? SHAP vs Rule Extraction vs RuleSHAP

Francesco Sovrano

682

167

16 May 2025

Towards Efficient Online Tuning of VLM Agents via Counterfactual Soft Reinforcement Learning

406

01 May 2025

Toward a Human-Centered Evaluation Framework for Trustworthy LLM-Powered GUI Agents

Simret Araya Gebreegziabher

509

24 Apr 2025

UFO2: The Desktop AgentOS

...

835

20 Apr 2025

TongUI: Internet-Scale Trajectories from Multimodal Web Tutorials for Generalized GUI Agents

580

17 Apr 2025

The Obvious Invisible Threat: LLM-Powered GUI Agents' Vulnerability to Fine-Print Injections

...

490

15 Apr 2025

ScreenSpot-Pro: GUI Grounding for Professional High-Resolution Computer Use

445

157

04 Apr 2025

A Survey of WebAgents: Towards Next-Generation AI Agents for Web Automation with Large Foundation Models

...

837

30 Mar 2025

ScreenLLM: Stateful Screen Schema for Efficient Action Understanding and PredictionThe Web Conference (WWW), 2025

1.0K

26 Mar 2025

Boosting Virtual Agent Learning and Reasoning: A Step-Wise, Multi-Dimensional, and Generalist Reward Model with Benchmark

489

24 Mar 2025

API Agents vs. GUI Agents: Divergence and Convergence

546

14 Mar 2025

CHOP: Mobile Operating Assistant with Constrained High-frequency Optimized Subtask Planning

362

05 Mar 2025

AppAgentX: Evolving GUI Agents as Proficient Smartphone Users

601

04 Mar 2025

Smoothing Grounding and Reasoning for MLLM-Powered GUI Agents with Query-Oriented Pivot Tasks

427

01 Mar 2025

Nexus: A Lightweight and Scalable Multi-Agent Framework for Complex Tasks Automation

356

26 Feb 2025

VEM: Environment-Free Exploration for Training GUI Agent with Value Environment Model

420

26 Feb 2025

AgentStudio: A Toolkit for Building General Virtual AgentsInternational Conference on Learning Representations (ICLR), 2024

566

17 Feb 2025

WebRL: Training LLM Web Agents via Self-Evolving Online Curriculum Reinforcement Learning

...

661

149

28 Jan 2025

Mobile-Agent-E: Self-Evolving Mobile Assistant for Complex Tasks

616

20 Jan 2025

Aria-UI: Visual Grounding for GUI InstructionsAnnual Meeting of the Association for Computational Linguistics (ACL), 2024

647

109

20 Dec 2024