Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales

Terms and Conditions

Twitter GitHub LinkedIn Bluesky Youtube

© 2026 ResearchTrend.AI, All rights reserved.

Home
Papers
2411.04890
Cited By

GUI Agents with Foundation Models: A Comprehensive Survey

v1v2 (latest)

GUI Agents with Foundation Models: A Comprehensive Survey

7 November 2024

Youssef Attia El Hili

Bin Wang

Chuhan Wu

Yasheng Wang

Ruiming Tang

Jianye Hao

ArXiv (abs)PDF HTML Github

Papers citing "GUI Agents with Foundation Models: A Comprehensive Survey"

50 / 77 papers shown

GUI-AIMA: Aligning Intrinsic Multimodal Attention with a Context Anchor for GUI Grounding

GUI-AIMA: Aligning Intrinsic Multimodal Attention with a Context Anchor for GUI Grounding

222

2

0

30 Mar 2026

A Variance-Based Analysis of Sample Complexity for Grid Coverage

A Variance-Based Analysis of Sample Complexity for Grid Coverage

241

5

0

21 Nov 2025

DualTAP: A Dual-Task Adversarial Protector for Mobile MLLM Agents

DualTAP: A Dual-Task Adversarial Protector for Mobile MLLM Agents

Wei Yang Bryan Lim

238

2

0

17 Nov 2025

CORE: Reducing UI Exposure in Mobile Agents via Collaboration Between Cloud and Local LLMs

CORE: Reducing UI Exposure in Mobile Agents via Collaboration Between Cloud and Local LLMs

165

5

0

17 Oct 2025

Practical and Stealthy Touch-Guided Jailbreak Attacks on Deployed Mobile Vision-Language Agents

Practical and Stealthy Touch-Guided Jailbreak Attacks on Deployed Mobile Vision-Language Agents

507

0

0

09 Oct 2025

Training-Free Group Relative Policy Optimization

Training-Free Group Relative Policy Optimization

...

350

17

0

09 Oct 2025

ReInAgent: A Context-Aware GUI Agent Enabling Human-in-the-Loop Mobile Task Navigation

ReInAgent: A Context-Aware GUI Agent Enabling Human-in-the-Loop Mobile Task Navigation

150

0

0

09 Oct 2025

Say One Thing, Do Another? Diagnosing Reasoning-Execution Gaps in VLM-Powered Mobile-Use Agents

Say One Thing, Do Another? Diagnosing Reasoning-Execution Gaps in VLM-Powered Mobile-Use Agents

Zhuosheng Zhang

195

1

0

02 Oct 2025

PAL-UI: Planning with Active Look-back for Vision-Based GUI Agents

PAL-UI: Planning with Active Look-back for Vision-Based GUI Agents

205

3

0

01 Oct 2025

Agent-ScanKit: Unraveling Memory and Reasoning of Multimodal Agents via Sensitivity Perturbations

Agent-ScanKit: Unraveling Memory and Reasoning of Multimodal Agents via Sensitivity Perturbations

Zhuosheng Zhang

Zhuosheng Zhang

441

1

0

01 Oct 2025

Generalist Scanner Meets Specialist Locator: A Synergistic Coarse-to-Fine Framework for Robust GUI Grounding

Generalist Scanner Meets Specialist Locator: A Synergistic Coarse-to-Fine Framework for Robust GUI Grounding

131

3

0

29 Sep 2025

Secure and Efficient Access Control for Computer-Use Agents via Context Space

Secure and Efficient Access Control for Computer-Use Agents via Context Space

339

0

0

26 Sep 2025

GUI-ReWalk: Massive Data Generation for GUI Agent via Stochastic Exploration and Intent-Aware Reasoning

GUI-ReWalk: Massive Data Generation for GUI Agent via Stochastic Exploration and Intent-Aware Reasoning

221

2

0

19 Sep 2025

BTL-UI: Blink-Think-Link Reasoning Model for GUI Agent

BTL-UI: Blink-Think-Link Reasoning Model for GUI Agent

...

444

3

0

19 Sep 2025

Small Models, Big Results: Achieving Superior Intent Extraction through Decomposition

Small Models, Big Results: Achieving Superior Intent Extraction through Decomposition

Omri Berkovitch

146

1

0

15 Sep 2025

Environmental Injection Attacks against GUI Agents in Realistic Dynamic Environments

Environmental Injection Attacks against GUI Agents in Realistic Dynamic Environments

230

2

0

14 Sep 2025

Instruction Agent: Enhancing Agent with Expert Demonstration

Instruction Agent: Enhancing Agent with Expert Demonstration

Hailey Hultquist

138

0

0

08 Sep 2025

UI-TARS-2 Technical Report: Advancing GUI Agent with Multi-Turn Reinforcement Learning

UI-TARS-2 Technical Report: Advancing GUI Agent with Multi-Turn Reinforcement Learning

...

434

85

0

02 Sep 2025

A Multimodal GUI Architecture for Interfacing with LLM-Based Conversational Assistants

A Multimodal GUI Architecture for Interfacing with LLM-Based Conversational Assistants

Hans G.W. van Dam

309

0

0

31 Aug 2025

SWIRL: A Staged Workflow for Interleaved Reinforcement Learning in Mobile GUI Control

SWIRL: A Staged Workflow for Interleaved Reinforcement Learning in Mobile GUI Control

289

0

0

27 Aug 2025

Mobile-Agent-v3: Fundamental Agents for GUI Automation

Mobile-Agent-v3: Fundamental Agents for GUI Automation

...

342

71

0

21 Aug 2025

Cybernaut: Towards Reliable Web Automation

Cybernaut: Towards Reliable Web Automation

Indranil Bhattacharya

Francesco Carbone

138

1

0

21 Aug 2025

UI-Venus Technical Report: Building High-performance UI Agents with RFT

UI-Venus Technical Report: Building High-performance UI Agents with RFT

...

473

35

0

14 Aug 2025

Quick on the Uptake: Eliciting Implicit Intents from Human Demonstrations for Personalized Mobile-Use Agents

Quick on the Uptake: Eliciting Implicit Intents from Human Demonstrations for Personalized Mobile-Use Agents

Zhuosheng Zhang

225

7

0

12 Aug 2025

OS Agents: A Survey on MLLM-based Agents for General Computing Devices Use

OS Agents: A Survey on MLLM-based Agents for General Computing Devices Use

...

LLMAG LM&Ro AI4TS

341

42

0

06 Aug 2025

Uncertainty-Aware GUI Agent: Adaptive Perception through Component Recommendation and Human-in-the-Loop Refinement

Uncertainty-Aware GUI Agent: Adaptive Perception through Component Recommendation and Human-in-the-Loop Refinement

347

9

0

06 Aug 2025

NaviMaster: Learning a Unified Policy for GUI and Embodied Navigation Tasks

NaviMaster: Learning a Unified Policy for GUI and Embodied Navigation Tasks

Wentao Yan abd Jingyu Gong

254

8

0

04 Aug 2025

Evaluation and Benchmarking of LLM Agents: A Survey

Evaluation and Benchmarking of LLM Agents: A Survey

Mahmoud Mohammadi

643

68

0

29 Jul 2025

UI-AGILE: Advancing GUI Agents with Effective Reinforcement Learning and Precise Inference-Time Grounding

UI-AGILE: Advancing GUI Agents with Effective Reinforcement Learning and Precise Inference-Time Grounding

802

16

0

29 Jul 2025

Why Do Open-Source LLMs Struggle with Data Analysis? A Systematic Empirical Study

Why Do Open-Source LLMs Struggle with Data Analysis? A Systematic Empirical Study

495

1

0

24 Jun 2025

LPO: Towards Accurate GUI Agent Interaction via Location Preference Optimization

LPO: Towards Accurate GUI Agent Interaction via Location Preference Optimization

...

379

11

0

11 Jun 2025

Atomic-to-Compositional Generalization for Mobile Agents with A New Benchmark and Scheduling System

Zhuosheng Zhang

335

7

0

10 Jun 2025

DeepShop: A Benchmark for Deep Research Shopping Agents

DeepShop: A Benchmark for Deep Research Shopping Agents

Maarten de Rijke

464

21

0

03 Jun 2025

AgentCPM-GUI: Building Mobile-Use Agents with Reinforcement Fine-Tuning

AgentCPM-GUI: Building Mobile-Use Agents with Reinforcement Fine-Tuning

...

405

31

0

02 Jun 2025

Robot Operation of Home Appliances by Reading User Manuals

Robot Operation of Home Appliances by Reading User Manuals

434

1

0

26 May 2025

TransBench: Breaking Barriers for Transferable Graphical User Interface Agents in Dynamic Digital Environments

TransBench: Breaking Barriers for Transferable Graphical User Interface Agents in Dynamic Digital EnvironmentsAnnual Meeting of the Association for Computational Linguistics (ACL), 2025

557

2

0

23 May 2025

ReGUIDE: Data Efficient GUI Grounding via Spatial Reasoning and Search

ReGUIDE: Data Efficient GUI Grounding via Spatial Reasoning and Search

430

6

0

21 May 2025

Hidden Ghost Hand: Unveiling Backdoor Vulnerabilities in MLLM-Powered Mobile GUI Agents

Hidden Ghost Hand: Unveiling Backdoor Vulnerabilities in MLLM-Powered Mobile GUI Agents

Zhuosheng Zhang

Zhuosheng Zhang

437

6

0

20 May 2025

Mobile-Agent-V: A Video-Guided Approach for Effortless and Efficient Operational Knowledge Injection in Mobile Automation

Mobile-Agent-V: A Video-Guided Approach for Effortless and Efficient Operational Knowledge Injection in Mobile Automation

528

0

0

20 May 2025

From Assistants to Adversaries: Exploring the Security Risks of Mobile LLM Agents

From Assistants to Adversaries: Exploring the Security Risks of Mobile LLM Agents

484

16

0

19 May 2025

Leveraging Vision-Language Models for Visual Grounding and Analysis of Automotive UI

Leveraging Vision-Language Models for Visual Grounding and Analysis of Automotive UI

Benjamin Raphael Ernhofer

Daniil Prokhorov

Jannica Langner

Dominik Bollmann

370

1

0

09 May 2025

TongUI: Internet-Scale Trajectories from Multimodal Web Tutorials for Generalized GUI Agents

TongUI: Internet-Scale Trajectories from Multimodal Web Tutorials for Generalized GUI Agents

554

21

0

17 Apr 2025

Towards Trustworthy GUI Agents: A Survey

Towards Trustworthy GUI Agents: A Survey

Ninghao Liu

365

22

0

30 Mar 2025

VeriSafe Agent: Safeguarding Mobile GUI Agent via Logic-based Action Verification

VeriSafe Agent: Safeguarding Mobile GUI Agent via Logic-based Action Verification

383

5

0

24 Mar 2025

Are AI Agents interacting with Online Ads?

Are AI Agents interacting with Online Ads?

Andreas Stöckl

527

2

0

20 Mar 2025

CHOP: Mobile Operating Assistant with Constrained High-frequency Optimized Subtask Planning

CHOP: Mobile Operating Assistant with Constrained High-frequency Optimized Subtask Planning

337

4

0

05 Mar 2025

VEM: Environment-Free Exploration for Training GUI Agent with Value Environment Model

VEM: Environment-Free Exploration for Training GUI Agent with Value Environment Model

Saravan Rajmohan

413

15

0

26 Feb 2025

Beyond In-Distribution Success: Scaling Curves of CoT Granularity for Language Model Generalization

Beyond In-Distribution Success: Scaling Curves of CoT Granularity for Language Model Generalization

445

6

0

25 Feb 2025

Mobile-Agent-E: Self-Evolving Mobile Assistant for Complex Tasks

Mobile-Agent-E: Self-Evolving Mobile Assistant for Complex Tasks

Zhenhailong Wang

596

94

0

20 Jan 2025

SPA-Bench: A Comprehensive Benchmark for SmartPhone Agent Evaluation

SPA-Bench: A Comprehensive Benchmark for SmartPhone Agent EvaluationInternational Conference on Learning Representations (ICLR), 2024

...

Yasheng Wang

Jun Wang

Youssef Attia El Hili

634

55

0

19 Oct 2024

Page 1 of 2