Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales

Terms and Conditions

Twitter GitHub LinkedIn Bluesky Youtube

© 2026 ResearchTrend.AI, All rights reserved.

Home
Papers
2411.04890
Cited By

GUI Agents with Foundation Models: A Comprehensive Survey

v1v2 (latest)

GUI Agents with Foundation Models: A Comprehensive Survey

7 November 2024

Youssef Attia El Hili

Bin Wang

Chuhan Wu

Yasheng Wang

Ruiming Tang

Jianye Hao

ArXiv (abs)PDF HTML

Papers citing "GUI Agents with Foundation Models: A Comprehensive Survey"

50 / 77 papers shown

A Variance-Based Analysis of Sample Complexity for Grid Coverage

A Variance-Based Analysis of Sample Complexity for Grid Coverage

190

3

0

21 Nov 2025

DualTAP: A Dual-Task Adversarial Protector for Mobile MLLM Agents

DualTAP: A Dual-Task Adversarial Protector for Mobile MLLM Agents

Wei Yang Bryan Lim

192

1

0

17 Nov 2025

GUI-AIMA: Aligning Intrinsic Multimodal Attention with a Context Anchor for GUI Grounding

GUI-AIMA: Aligning Intrinsic Multimodal Attention with a Context Anchor for GUI Grounding

175

1

0

02 Nov 2025

CORE: Reducing UI Exposure in Mobile Agents via Collaboration Between Cloud and Local LLMs

CORE: Reducing UI Exposure in Mobile Agents via Collaboration Between Cloud and Local LLMs

132

1

0

17 Oct 2025

Practical and Stealthy Touch-Guided Jailbreak Attacks on Deployed Mobile Vision-Language Agents

Practical and Stealthy Touch-Guided Jailbreak Attacks on Deployed Mobile Vision-Language Agents

466

0

0

09 Oct 2025

Training-Free Group Relative Policy Optimization

Training-Free Group Relative Policy Optimization

...

259

7

0

09 Oct 2025

ReInAgent: A Context-Aware GUI Agent Enabling Human-in-the-Loop Mobile Task Navigation

ReInAgent: A Context-Aware GUI Agent Enabling Human-in-the-Loop Mobile Task Navigation

123

0

0

09 Oct 2025

Say One Thing, Do Another? Diagnosing Reasoning-Execution Gaps in VLM-Powered Mobile-Use Agents

Say One Thing, Do Another? Diagnosing Reasoning-Execution Gaps in VLM-Powered Mobile-Use Agents

Zhuosheng Zhang

162

0

0

02 Oct 2025

PAL-UI: Planning with Active Look-back for Vision-Based GUI Agents

PAL-UI: Planning with Active Look-back for Vision-Based GUI Agents

166

2

0

01 Oct 2025

Agent-ScanKit: Unraveling Memory and Reasoning of Multimodal Agents via Sensitivity Perturbations

Agent-ScanKit: Unraveling Memory and Reasoning of Multimodal Agents via Sensitivity Perturbations

Zhuosheng Zhang

Zhuosheng Zhang

405

0

0

01 Oct 2025

Generalist Scanner Meets Specialist Locator: A Synergistic Coarse-to-Fine Framework for Robust GUI Grounding

Generalist Scanner Meets Specialist Locator: A Synergistic Coarse-to-Fine Framework for Robust GUI Grounding

109

3

0

29 Sep 2025

Secure and Efficient Access Control for Computer-Use Agents via Context Space

Secure and Efficient Access Control for Computer-Use Agents via Context Space

260

0

0

26 Sep 2025

GUI-ReWalk: Massive Data Generation for GUI Agent via Stochastic Exploration and Intent-Aware Reasoning

GUI-ReWalk: Massive Data Generation for GUI Agent via Stochastic Exploration and Intent-Aware Reasoning

201

0

0

19 Sep 2025

BTL-UI: Blink-Think-Link Reasoning Model for GUI Agent

BTL-UI: Blink-Think-Link Reasoning Model for GUI Agent

...

381

3

0

19 Sep 2025

Small Models, Big Results: Achieving Superior Intent Extraction through Decomposition

Small Models, Big Results: Achieving Superior Intent Extraction through Decomposition

Omri Berkovitch

119

0

0

15 Sep 2025

Realistic Environmental Injection Attacks on GUI Agents

Realistic Environmental Injection Attacks on GUI Agents

122

2

0

14 Sep 2025

Instruction Agent: Enhancing Agent with Expert Demonstration

Instruction Agent: Enhancing Agent with Expert Demonstration

Hailey Hultquist

115

0

0

08 Sep 2025

UI-TARS-2 Technical Report: Advancing GUI Agent with Multi-Turn Reinforcement Learning

UI-TARS-2 Technical Report: Advancing GUI Agent with Multi-Turn Reinforcement Learning

...

288

54

0

02 Sep 2025

A Multimodal GUI Architecture for Interfacing with LLM-Based Conversational Assistants

A Multimodal GUI Architecture for Interfacing with LLM-Based Conversational Assistants

Hans G.W. van Dam

271

0

0

31 Aug 2025

SWIRL: A Staged Workflow for Interleaved Reinforcement Learning in Mobile GUI Control

SWIRL: A Staged Workflow for Interleaved Reinforcement Learning in Mobile GUI Control

220

0

0

27 Aug 2025

Mobile-Agent-v3: Fundamental Agents for GUI Automation

Mobile-Agent-v3: Fundamental Agents for GUI Automation

...

269

50

0

21 Aug 2025

Cybernaut: Towards Reliable Web Automation

Cybernaut: Towards Reliable Web Automation

Indranil Bhattacharya

Francesco Carbone

121

1

0

21 Aug 2025

UI-Venus Technical Report: Building High-performance UI Agents with RFT

UI-Venus Technical Report: Building High-performance UI Agents with RFT

...

330

21

0

14 Aug 2025

Quick on the Uptake: Eliciting Implicit Intents from Human Demonstrations for Personalized Mobile-Use Agents

Quick on the Uptake: Eliciting Implicit Intents from Human Demonstrations for Personalized Mobile-Use Agents

Zhuosheng Zhang

145

6

0

12 Aug 2025

OS Agents: A Survey on MLLM-based Agents for General Computing Devices Use

OS Agents: A Survey on MLLM-based Agents for General Computing Devices Use

...

LLMAG LM&Ro AI4TS

244

32

0

06 Aug 2025

Uncertainty-Aware GUI Agent: Adaptive Perception through Component Recommendation and Human-in-the-Loop Refinement

Uncertainty-Aware GUI Agent: Adaptive Perception through Component Recommendation and Human-in-the-Loop Refinement

208

8

0

06 Aug 2025

NaviMaster: Learning a Unified Policy for GUI and Embodied Navigation Tasks

NaviMaster: Learning a Unified Policy for GUI and Embodied Navigation Tasks

Wentao Yan abd Jingyu Gong

220

6

0

04 Aug 2025

Evaluation and Benchmarking of LLM Agents: A Survey

Evaluation and Benchmarking of LLM Agents: A Survey

Mahmoud Mohammadi

422

40

0

29 Jul 2025

UI-AGILE: Advancing GUI Agents with Effective Reinforcement Learning and Precise Inference-Time Grounding

UI-AGILE: Advancing GUI Agents with Effective Reinforcement Learning and Precise Inference-Time Grounding

682

13

0

29 Jul 2025

Why Do Open-Source LLMs Struggle with Data Analysis? A Systematic Empirical Study

Why Do Open-Source LLMs Struggle with Data Analysis? A Systematic Empirical Study

420

1

0

24 Jun 2025

LPO: Towards Accurate GUI Agent Interaction via Location Preference Optimization

LPO: Towards Accurate GUI Agent Interaction via Location Preference Optimization

...

348

9

0

11 Jun 2025

Atomic-to-Compositional Generalization for Mobile Agents with A New Benchmark and Scheduling System

Zhuosheng Zhang

227

6

0

10 Jun 2025

DeepShop: A Benchmark for Deep Research Shopping Agents

DeepShop: A Benchmark for Deep Research Shopping Agents

Maarten de Rijke

341

14

0

03 Jun 2025

AgentCPM-GUI: Building Mobile-Use Agents with Reinforcement Fine-Tuning

AgentCPM-GUI: Building Mobile-Use Agents with Reinforcement Fine-Tuning

...

358

24

0

02 Jun 2025

Robot Operation of Home Appliances by Reading User Manuals

Robot Operation of Home Appliances by Reading User Manuals

349

1

0

26 May 2025

TransBench: Breaking Barriers for Transferable Graphical User Interface Agents in Dynamic Digital Environments

TransBench: Breaking Barriers for Transferable Graphical User Interface Agents in Dynamic Digital EnvironmentsAnnual Meeting of the Association for Computational Linguistics (ACL), 2025

509

1

0

23 May 2025

ReGUIDE: Data Efficient GUI Grounding via Spatial Reasoning and Search

ReGUIDE: Data Efficient GUI Grounding via Spatial Reasoning and Search

398

5

0

21 May 2025

Hidden Ghost Hand: Unveiling Backdoor Vulnerabilities in MLLM-Powered Mobile GUI Agents

Hidden Ghost Hand: Unveiling Backdoor Vulnerabilities in MLLM-Powered Mobile GUI Agents

Zhuosheng Zhang

Zhuosheng Zhang

399

5

0

20 May 2025

Mobile-Agent-V: A Video-Guided Approach for Effortless and Efficient Operational Knowledge Injection in Mobile Automation

Mobile-Agent-V: A Video-Guided Approach for Effortless and Efficient Operational Knowledge Injection in Mobile Automation

476

0

0

20 May 2025

From Assistants to Adversaries: Exploring the Security Risks of Mobile LLM Agents

From Assistants to Adversaries: Exploring the Security Risks of Mobile LLM Agents

457

14

0

19 May 2025

Leveraging Vision-Language Models for Visual Grounding and Analysis of Automotive UI

Leveraging Vision-Language Models for Visual Grounding and Analysis of Automotive UI

Benjamin Raphael Ernhofer

Daniil Prokhorov

Jannica Langner

Dominik Bollmann

336

1

0

09 May 2025

TongUI: Internet-Scale Trajectories from Multimodal Web Tutorials for Generalized GUI Agents

TongUI: Internet-Scale Trajectories from Multimodal Web Tutorials for Generalized GUI Agents

482

21

0

17 Apr 2025

Towards Trustworthy GUI Agents: A Survey

Towards Trustworthy GUI Agents: A Survey

291

18

0

30 Mar 2025

VeriSafe Agent: Safeguarding Mobile GUI Agent via Logic-based Action Verification

VeriSafe Agent: Safeguarding Mobile GUI Agent via Logic-based Action Verification

368

5

0

24 Mar 2025

Are AI Agents interacting with Online Ads?

Are AI Agents interacting with Online Ads?

Andreas Stöckl

479

2

0

20 Mar 2025

CHOP: Mobile Operating Assistant with Constrained High-frequency Optimized Subtask Planning

316

4

0

05 Mar 2025

VEM: Environment-Free Exploration for Training GUI Agent with Value Environment Model

VEM: Environment-Free Exploration for Training GUI Agent with Value Environment Model

Saravan Rajmohan

320

15

0

26 Feb 2025

Beyond In-Distribution Success: Scaling Curves of CoT Granularity for Language Model Generalization

Beyond In-Distribution Success: Scaling Curves of CoT Granularity for Language Model Generalization

411

6

0

25 Feb 2025

Mobile-Agent-E: Self-Evolving Mobile Assistant for Complex Tasks

Mobile-Agent-E: Self-Evolving Mobile Assistant for Complex Tasks

Zhenhailong Wang

495

81

0

20 Jan 2025

SPA-Bench: A Comprehensive Benchmark for SmartPhone Agent Evaluation

SPA-Bench: A Comprehensive Benchmark for SmartPhone Agent EvaluationInternational Conference on Learning Representations (ICLR), 2024

...

Yasheng Wang

Jun Wang

Youssef Attia El Hili

522

47

0

19 Oct 2024

Page 1 of 2