ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2411.04890
  4. Cited By
GUI Agents with Foundation Models: A Comprehensive Survey
v1v2 (latest)

GUI Agents with Foundation Models: A Comprehensive Survey

7 November 2024
Shuai Wang
Wen Liu
Jingxuan Chen
Weinan Gan
Xingshan Zeng
Shuai Yu
Xinlong Hao
Youssef Attia El Hili
Yasheng Wang
Ruiming Tang
Bin Wang
Chuhan Wu
Yasheng Wang
Ruiming Tang
Jianye Hao
    LLMAG
ArXiv (abs)PDFHTML

Papers citing "GUI Agents with Foundation Models: A Comprehensive Survey"

50 / 77 papers shown
Environmental Injection Attacks against GUI Agents in Realistic Dynamic Environments
Environmental Injection Attacks against GUI Agents in Realistic Dynamic Environments
Yitong Zhang
Ximo Li
L. Cai
Jia Li
LLMAGAAML
122
2
0
03 Feb 2026
A Variance-Based Analysis of Sample Complexity for Grid Coverage
A Variance-Based Analysis of Sample Complexity for Grid Coverage
Lyu Yuhuan
190
3
0
21 Nov 2025
DualTAP: A Dual-Task Adversarial Protector for Mobile MLLM Agents
DualTAP: A Dual-Task Adversarial Protector for Mobile MLLM Agents
Fuyao Zhang
Jiaming Zhang
C. Wang
Xiongtao Sun
Yurong Hao
Guowei Guan
Wenjie Li
Longtao Huang
Wei Yang Bryan Lim
AAML
192
1
0
17 Nov 2025
GUI-AIMA: Aligning Intrinsic Multimodal Attention with a Context Anchor for GUI Grounding
GUI-AIMA: Aligning Intrinsic Multimodal Attention with a Context Anchor for GUI Grounding
Shijie Zhou
Viet Dac Lai
Hao Tan
Jihyung Kil
Wanrong Zhu
Changyou Chen
Ruiyi Zhang
175
1
0
02 Nov 2025
CORE: Reducing UI Exposure in Mobile Agents via Collaboration Between Cloud and Local LLMs
CORE: Reducing UI Exposure in Mobile Agents via Collaboration Between Cloud and Local LLMs
Gucongcong Fan
Chaoyue Niu
Chengfei Lyu
Fan Wu
Guihai Chen
132
1
0
17 Oct 2025
Practical and Stealthy Touch-Guided Jailbreak Attacks on Deployed Mobile Vision-Language Agents
Practical and Stealthy Touch-Guided Jailbreak Attacks on Deployed Mobile Vision-Language Agents
Renhua Ding
Xiao Yang
Zhengwei Fang
Jun Luo
Kun He
Jun Zhu
AAML
466
0
0
09 Oct 2025
ReInAgent: A Context-Aware GUI Agent Enabling Human-in-the-Loop Mobile Task Navigation
ReInAgent: A Context-Aware GUI Agent Enabling Human-in-the-Loop Mobile Task Navigation
Haitao Jia
Ming He
Zimo Yin
Likang Wu
Jianping Fan
Jitao Sang
123
0
0
09 Oct 2025
Training-Free Group Relative Policy Optimization
Training-Free Group Relative Policy Optimization
Yuzheng Cai
Siqi Cai
Yuchen Shi
Zihan Xu
Lichao Chen
...
Zongyi Li
Haojia Lin
Yong Mao
Ke Li
Xing Sun
OffRL
259
7
0
09 Oct 2025
Say One Thing, Do Another? Diagnosing Reasoning-Execution Gaps in VLM-Powered Mobile-Use Agents
Say One Thing, Do Another? Diagnosing Reasoning-Execution Gaps in VLM-Powered Mobile-Use Agents
Lingzhong Dong
Ziqi Zhou
Shuaibo Yang
Haiyue Sheng
Pengzhou Cheng
Zongru Wu
Zheng Wu
Gongshen Liu
Zhuosheng Zhang
LRM
162
0
0
02 Oct 2025
PAL-UI: Planning with Active Look-back for Vision-Based GUI Agents
PAL-UI: Planning with Active Look-back for Vision-Based GUI Agents
Zikang Liu
Junyi Li
Wayne Xin Zhao
Dawei Gao
Yaliang Li
Ji-Rong Wen
LLMAG
166
2
0
01 Oct 2025
Agent-ScanKit: Unraveling Memory and Reasoning of Multimodal Agents via Sensitivity Perturbations
Agent-ScanKit: Unraveling Memory and Reasoning of Multimodal Agents via Sensitivity Perturbations
Pengzhou Cheng
Lingzhong Dong
Zeng Wu
Zongru Wu
Zhuosheng Zhang
Chengwei Qin
Zhuosheng Zhang
Gongshen Liu
LLMAG
405
0
0
01 Oct 2025
Generalist Scanner Meets Specialist Locator: A Synergistic Coarse-to-Fine Framework for Robust GUI Grounding
Generalist Scanner Meets Specialist Locator: A Synergistic Coarse-to-Fine Framework for Robust GUI Grounding
Zhecheng Li
Guoxian Song
Yiwei Wang
Zhen Xiong
Junsong Yuan
Yujun Cai
109
3
0
29 Sep 2025
Secure and Efficient Access Control for Computer-Use Agents via Context Space
Secure and Efficient Access Control for Computer-Use Agents via Context Space
Haochen Gong
Chenxiao Li
Rui Chang
Wenbo Shen
LLMAG
260
0
0
26 Sep 2025
GUI-ReWalk: Massive Data Generation for GUI Agent via Stochastic Exploration and Intent-Aware Reasoning
GUI-ReWalk: Massive Data Generation for GUI Agent via Stochastic Exploration and Intent-Aware Reasoning
Musen Lin
M. Liu
Taoran Lu
L. Yuan
Yiwei Liu
Haonan Xu
Yu Miao
Yuhao Chao
Ruoyao Xiao
201
0
0
19 Sep 2025
BTL-UI: Blink-Think-Link Reasoning Model for GUI Agent
BTL-UI: Blink-Think-Link Reasoning Model for GUI Agent
Shaojie Zhang
Ruoceng Zhang
Pei Fu
S. Wang
Jiahui Yang
...
Shiqi Cui
Bin Qin
Ying Huang
Zhenbo Luo
Jian Luan
LLMAGMLLM
381
3
0
19 Sep 2025
Small Models, Big Results: Achieving Superior Intent Extraction through Decomposition
Small Models, Big Results: Achieving Superior Intent Extraction through Decomposition
Danielle Cohen
Yoni Halpern
Noam Kahlon
Joel Oren
Omri Berkovitch
Sapir Caduri
Ido Dagan
Anatoly Efros
119
0
0
15 Sep 2025
Instruction Agent: Enhancing Agent with Expert Demonstration
Instruction Agent: Enhancing Agent with Expert Demonstration
Yinheng Li
Hailey Hultquist
Justin Wagle
K. Koishida
LLMAG
115
0
0
08 Sep 2025
UI-TARS-2 Technical Report: Advancing GUI Agent with Multi-Turn Reinforcement Learning
UI-TARS-2 Technical Report: Advancing GUI Agent with Multi-Turn Reinforcement Learning
Haoming Wang
Haoyang Zou
Huatong Song
J. Feng
Junjie Fang
...
Xianzheng Ma
Xiaojun Xiao
X. Y. Huang
Xinjie Chen
Yidi Du
LLMAG
288
54
0
02 Sep 2025
A Multimodal GUI Architecture for Interfacing with LLM-Based Conversational Assistants
A Multimodal GUI Architecture for Interfacing with LLM-Based Conversational Assistants
Hans G.W. van Dam
271
0
0
31 Aug 2025
SWIRL: A Staged Workflow for Interleaved Reinforcement Learning in Mobile GUI Control
SWIRL: A Staged Workflow for Interleaved Reinforcement Learning in Mobile GUI Control
Quanfeng Lu
Zhantao Ma
Shuai Zhong
Jin Wang
Dahai Yu
Michael K. Ng
Ping Luo
220
0
0
27 Aug 2025
Mobile-Agent-v3: Fundamental Agents for GUI Automation
Mobile-Agent-v3: Fundamental Agents for GUI Automation
Jiabo Ye
Xi Zhang
Haiyang Xu
Haowei Liu
Junyang Wang
...
Jitong Liao
Qi Zheng
Fei Huang
Jingren Zhou
Ming Yan
LLMAGLM&Ro
269
50
0
21 Aug 2025
Cybernaut: Towards Reliable Web Automation
Cybernaut: Towards Reliable Web Automation
Ankur Tomar
Hengyue Liang
Indranil Bhattacharya
Natalia Larios
Francesco Carbone
121
1
0
21 Aug 2025
UI-Venus Technical Report: Building High-performance UI Agents with RFT
UI-Venus Technical Report: Building High-performance UI Agents with RFT
Zhangxuan Gu
Zhengwen Zeng
Zhenyu Xu
Xingran Zhou
Shuheng Shen
...
Yuan Guo
Yong Deng
Zhenyu Guo
Liang Chen
Weiqiang Wang
LLMAGLM&Ro
330
21
0
14 Aug 2025
Quick on the Uptake: Eliciting Implicit Intents from Human Demonstrations for Personalized Mobile-Use Agents
Quick on the Uptake: Eliciting Implicit Intents from Human Demonstrations for Personalized Mobile-Use Agents
Zheng Wu
Heyuan Huang
Y. Yang
Yuanyi Song
Xingyu Lou
Weiwen Liu
Weinan Zhang
Jun Wang
Zhuosheng Zhang
145
6
0
12 Aug 2025
OS Agents: A Survey on MLLM-based Agents for General Computing Devices Use
OS Agents: A Survey on MLLM-based Agents for General Computing Devices Use
Xueyu Hu
Tao Xiong
Biao Yi
Zishu Wei
Ruixuan Xiao
...
Zhou Zhao
Hongxia Yang
Fan Wu
Shengyu Zhang
Fei Wu
LLMAGLM&RoAI4TS
244
32
0
06 Aug 2025
Uncertainty-Aware GUI Agent: Adaptive Perception through Component Recommendation and Human-in-the-Loop Refinement
Uncertainty-Aware GUI Agent: Adaptive Perception through Component Recommendation and Human-in-the-Loop Refinement
Chao Hao
Shuai Wang
Kaiwen Zhou
208
8
0
06 Aug 2025
NaviMaster: Learning a Unified Policy for GUI and Embodied Navigation Tasks
NaviMaster: Learning a Unified Policy for GUI and Embodied Navigation Tasks
Zhihao Luo
Wentao Yan abd Jingyu Gong
Min Wang
Zhizhong Zhang
Xuhong Wang
Yuan Xie
Xin Tan
Xin Tan
220
6
0
04 Aug 2025
Evaluation and Benchmarking of LLM Agents: A Survey
Evaluation and Benchmarking of LLM Agents: A Survey
Mahmoud Mohammadi
Yipeng Li
Jane Lo
Wendy Yip
LLMAGELM
422
40
0
29 Jul 2025
UI-AGILE: Advancing GUI Agents with Effective Reinforcement Learning and Precise Inference-Time Grounding
UI-AGILE: Advancing GUI Agents with Effective Reinforcement Learning and Precise Inference-Time Grounding
Shuquan Lian
Yuhang Wu
Jia Ma
Yifan Ding
Zihan Song
Bingqi Chen
Xiawu Zheng
Hui Li
LLMAG
684
13
0
29 Jul 2025
Why Do Open-Source LLMs Struggle with Data Analysis? A Systematic Empirical Study
Why Do Open-Source LLMs Struggle with Data Analysis? A Systematic Empirical Study
Yuqi Zhu
Yi Zhong
Jintian Zhang
Ziheng Zhang
Shuofei Qiao
Yujie Luo
Lun Du
Da Zheng
Ningyu Zhang
Huajun Chen
ELM
420
1
0
24 Jun 2025
LPO: Towards Accurate GUI Agent Interaction via Location Preference Optimization
LPO: Towards Accurate GUI Agent Interaction via Location Preference Optimization
Jiaqi Tang
Yu Xia
Yi-Feng Wu
Yuwei Hu
Yuhui Chen
...
Xiangyu Wu
Hao Lu
Yanqing Ma
Shiyin Lu
Qifeng Chen
348
9
0
11 Jun 2025
Atomic-to-Compositional Generalization for Mobile Agents with A New Benchmark and Scheduling System
Yuan Guo
Tingjia Miao
Zheng Wu
Pengzhou Cheng
Ming Zhou
Zhuosheng Zhang
227
6
0
10 Jun 2025
DeepShop: A Benchmark for Deep Research Shopping Agents
DeepShop: A Benchmark for Deep Research Shopping Agents
Yougang Lyu
Xiaoyu Zhang
Lingyong Yan
Maarten de Rijke
Zhaochun Ren
Xiuying Chen
341
14
0
03 Jun 2025
AgentCPM-GUI: Building Mobile-Use Agents with Reinforcement Fine-Tuning
AgentCPM-GUI: Building Mobile-Use Agents with Reinforcement Fine-Tuning
Zhong Zhang
Yaxi Lu
Yikun Fu
Yupeng Huo
Shenzhi Yang
...
Chongyi Wang
Chi Chen
Yuan Yao
Zhiyuan Liu
Maosong Sun
LLMAGALM
358
24
0
02 Jun 2025
Robot Operation of Home Appliances by Reading User Manuals
Robot Operation of Home Appliances by Reading User Manuals
Jian Zhang
Hanbo Zhang
Anxing Xiao
David Hsu
LM&Ro
349
1
0
26 May 2025
TransBench: Breaking Barriers for Transferable Graphical User Interface Agents in Dynamic Digital Environments
TransBench: Breaking Barriers for Transferable Graphical User Interface Agents in Dynamic Digital EnvironmentsAnnual Meeting of the Association for Computational Linguistics (ACL), 2025
Yuheng Lu
Qian Yu
Hongru Wang
Zeming Liu
Wei Su
Yanping Liu
Yuhang Guo
Maocheng Liang
Yunhong Wang
Haifeng Wang
LLMAG
509
1
0
23 May 2025
ReGUIDE: Data Efficient GUI Grounding via Spatial Reasoning and Search
ReGUIDE: Data Efficient GUI Grounding via Spatial Reasoning and Search
Hyunseok Lee
Jeonghoon Kim
Beomjun Kim
Jihoon Tack
Chansong Jo
Jaehong Lee
Cheonbok Park
Sookyo In
Jinwoo Shin
Kang Min Yoo
398
5
0
21 May 2025
Hidden Ghost Hand: Unveiling Backdoor Vulnerabilities in MLLM-Powered Mobile GUI Agents
Hidden Ghost Hand: Unveiling Backdoor Vulnerabilities in MLLM-Powered Mobile GUI Agents
Pengzhou Cheng
Haowen Hu
Zheng Wu
Zongru Wu
Tianjie Ju
Zhuosheng Zhang
Zhuosheng Zhang
LLMAGAAML
401
5
0
20 May 2025
Mobile-Agent-V: A Video-Guided Approach for Effortless and Efficient Operational Knowledge Injection in Mobile Automation
Mobile-Agent-V: A Video-Guided Approach for Effortless and Efficient Operational Knowledge Injection in Mobile Automation
Junyang Wang
Haiyang Xu
Xi Zhang
Ming Yan
Ji Zhang
Fei Huang
Jitao Sang
476
0
0
20 May 2025
From Assistants to Adversaries: Exploring the Security Risks of Mobile LLM Agents
From Assistants to Adversaries: Exploring the Security Risks of Mobile LLM Agents
Liangxuan Wu
Chao Wang
Tianming Liu
Yanjie Zhao
Haoyu Wang
AAML
457
14
0
19 May 2025
Leveraging Vision-Language Models for Visual Grounding and Analysis of Automotive UI
Leveraging Vision-Language Models for Visual Grounding and Analysis of Automotive UI
Benjamin Raphael Ernhofer
Daniil Prokhorov
Jannica Langner
Dominik Bollmann
336
1
0
09 May 2025
TongUI: Internet-Scale Trajectories from Multimodal Web Tutorials for Generalized GUI Agents
TongUI: Internet-Scale Trajectories from Multimodal Web Tutorials for Generalized GUI Agents
Bofei Zhang
Zirui Shang
Zhi Gao
Wang Zhang
Rui Xie
Xiaojian Ma
Tao Yuan
Xinxiao Wu
Song-Chun Zhu
Qing Li
LLMAG
482
21
0
17 Apr 2025
Towards Trustworthy GUI Agents: A Survey
Towards Trustworthy GUI Agents: A Survey
Yucheng Shi
Wenhao Yu
Wenlin Yao
Wenhu Chen
Ninghao Liu
291
18
0
30 Mar 2025
VeriSafe Agent: Safeguarding Mobile GUI Agent via Logic-based Action Verification
VeriSafe Agent: Safeguarding Mobile GUI Agent via Logic-based Action Verification
Jungjae Lee
Dongjae Lee
Chihun Choi
Youngmin Im
Jaeyoung Wi
Kihong Heo
Sangeun Oh
Sunjae Lee
Insik Shin
LLMAG
368
5
0
24 Mar 2025
Are AI Agents interacting with Online Ads?
Are AI Agents interacting with Online Ads?
Andreas Stöckl
Joel Nitu
482
2
0
20 Mar 2025
CHOP: Mobile Operating Assistant with Constrained High-frequency Optimized Subtask Planning
Yuqi Zhou
Shuai Wang
Sunhao Dai
Qinglin Jia
Zhaocheng Du
Zhenhua Dong
Jun Xu
LM&Ro
316
4
0
05 Mar 2025
VEM: Environment-Free Exploration for Training GUI Agent with Value Environment Model
VEM: Environment-Free Exploration for Training GUI Agent with Value Environment Model
Jiani Zheng
Lu Wang
Fangkai Yang
Chen Zhang
Shansong Liu
Wenjie Yin
Qingwei Lin
Dongmei Zhang
Saravan Rajmohan
Qi Zhang
OffRL
320
15
0
26 Feb 2025
Beyond In-Distribution Success: Scaling Curves of CoT Granularity for Language Model Generalization
Beyond In-Distribution Success: Scaling Curves of CoT Granularity for Language Model Generalization
Ru Wang
Wei Huang
Selena Song
Haoyu Zhang
Yusuke Iwasawa
Y. Matsuo
Jiaxian Guo
OODDLRM
411
6
0
25 Feb 2025
Mobile-Agent-E: Self-Evolving Mobile Assistant for Complex Tasks
Mobile-Agent-E: Self-Evolving Mobile Assistant for Complex Tasks
Zhenhailong Wang
Haiyang Xu
Junyang Wang
Xi Zhang
Ming Yan
Junxuan Zhang
Fei Huang
Heng Ji
495
81
0
20 Jan 2025
SPA-Bench: A Comprehensive Benchmark for SmartPhone Agent Evaluation
SPA-Bench: A Comprehensive Benchmark for SmartPhone Agent EvaluationInternational Conference on Learning Representations (ICLR), 2024
Jingxuan Chen
Derek Yuen
Bin Xie
Yue Yang
Gongwei Chen
...
Liqiang Nie
Yasheng Wang
Jianye Hao
Jun Wang
Youssef Attia El Hili
LLMAG
522
47
0
19 Oct 2024
12
Next
Page 1 of 2