ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2411.04890
  4. Cited By
GUI Agents with Foundation Models: A Comprehensive Survey
v1v2 (latest)

GUI Agents with Foundation Models: A Comprehensive Survey

7 November 2024
Shuai Wang
Wen Liu
Jingxuan Chen
Weinan Gan
Xingshan Zeng
Shuai Yu
Xinlong Hao
Youssef Attia El Hili
Yasheng Wang
Ruiming Tang
Bin Wang
Chuhan Wu
Yasheng Wang
Ruiming Tang
Jianye Hao
    LLMAG
ArXiv (abs)PDFHTML

Papers citing "GUI Agents with Foundation Models: A Comprehensive Survey"

50 / 77 papers shown
Title
Improved Sample Complexity for Full Coverage in Compact and Continuous Spaces
Improved Sample Complexity for Full Coverage in Compact and Continuous Spaces
Lyu Yuhuan
124
3
0
21 Nov 2025
DualTAP: A Dual-Task Adversarial Protector for Mobile MLLM Agents
DualTAP: A Dual-Task Adversarial Protector for Mobile MLLM Agents
Fuyao Zhang
Jiaming Zhang
C. Wang
Xiongtao Sun
Yurong Hao
Guowei Guan
Wenjie Li
Longtao Huang
Wei Yang Bryan Lim
AAML
140
0
0
17 Nov 2025
GUI-AIMA: Aligning Intrinsic Multimodal Attention with a Context Anchor for GUI Grounding
GUI-AIMA: Aligning Intrinsic Multimodal Attention with a Context Anchor for GUI Grounding
Shijie Zhou
Viet Dac Lai
Hao Tan
Jihyung Kil
Wanrong Zhu
Changyou Chen
Ruiyi Zhang
118
1
0
02 Nov 2025
CORE: Reducing UI Exposure in Mobile Agents via Collaboration Between Cloud and Local LLMs
CORE: Reducing UI Exposure in Mobile Agents via Collaboration Between Cloud and Local LLMs
Gucongcong Fan
Chaoyue Niu
Chengfei Lyu
Fan Wu
Guihai Chen
80
1
0
17 Oct 2025
Practical and Stealthy Touch-Guided Jailbreak Attacks on Deployed Mobile Vision-Language Agents
Practical and Stealthy Touch-Guided Jailbreak Attacks on Deployed Mobile Vision-Language Agents
Renhua Ding
Xiao Yang
Zhengwei Fang
Jun Luo
Kun He
Jun Zhu
AAML
230
0
0
09 Oct 2025
Training-Free Group Relative Policy Optimization
Training-Free Group Relative Policy Optimization
Yuzheng Cai
Siqi Cai
Yuchen Shi
Zihan Xu
Lichao Chen
...
Zongyi Li
Haojia Lin
Yong Mao
Ke Li
Xing Sun
OffRL
147
1
0
09 Oct 2025
ReInAgent: A Context-Aware GUI Agent Enabling Human-in-the-Loop Mobile Task Navigation
ReInAgent: A Context-Aware GUI Agent Enabling Human-in-the-Loop Mobile Task Navigation
Haitao Jia
Ming He
Zimo Yin
Likang Wu
Jianping Fan
Jitao Sang
68
0
0
09 Oct 2025
Say One Thing, Do Another? Diagnosing Reasoning-Execution Gaps in VLM-Powered Mobile-Use Agents
Say One Thing, Do Another? Diagnosing Reasoning-Execution Gaps in VLM-Powered Mobile-Use Agents
Lingzhong Dong
Ziqi Zhou
Shuaibo Yang
Haiyue Sheng
Pengzhou Cheng
Zongru Wu
Zheng Wu
Gongshen Liu
Zhuosheng Zhang
LRM
103
0
0
02 Oct 2025
PAL-UI: Planning with Active Look-back for Vision-Based GUI Agents
PAL-UI: Planning with Active Look-back for Vision-Based GUI Agents
Zikang Liu
Junyi Li
Wayne Xin Zhao
Dawei Gao
Yaliang Li
Ji-Rong Wen
LLMAG
98
2
0
01 Oct 2025
Agent-ScanKit: Unraveling Memory and Reasoning of Multimodal Agents via Sensitivity Perturbations
Agent-ScanKit: Unraveling Memory and Reasoning of Multimodal Agents via Sensitivity Perturbations
Pengzhou Cheng
Lingzhong Dong
Zeng Wu
Zongru Wu
Zhuosheng Zhang
Chengwei Qin
Zhuosheng Zhang
Gongshen Liu
LLMAG
362
0
0
01 Oct 2025
Generalist Scanner Meets Specialist Locator: A Synergistic Coarse-to-Fine Framework for Robust GUI Grounding
Generalist Scanner Meets Specialist Locator: A Synergistic Coarse-to-Fine Framework for Robust GUI Grounding
Zhecheng Li
Guoxian Song
Yiwei Wang
Zhen Xiong
Junsong Yuan
Yujun Cai
73
2
0
29 Sep 2025
Secure and Efficient Access Control for Computer-Use Agents via Context Space
Secure and Efficient Access Control for Computer-Use Agents via Context Space
Haochen Gong
Chenxiao Li
Rui Chang
Wenbo Shen
LLMAG
122
0
0
26 Sep 2025
GUI-ReWalk: Massive Data Generation for GUI Agent via Stochastic Exploration and Intent-Aware Reasoning
GUI-ReWalk: Massive Data Generation for GUI Agent via Stochastic Exploration and Intent-Aware Reasoning
Musen Lin
M. Liu
Taoran Lu
L. Yuan
Yiwei Liu
Haonan Xu
Yu Miao
Yuhao Chao
Ruoyao Xiao
116
0
0
19 Sep 2025
BTL-UI: Blink-Think-Link Reasoning Model for GUI Agent
BTL-UI: Blink-Think-Link Reasoning Model for GUI Agent
Shaojie Zhang
Ruoceng Zhang
Pei Fu
S. Wang
Jiahui Yang
...
Shiqi Cui
Bin Qin
Ying Huang
Zhenbo Luo
Jian Luan
LLMAGMLLM
223
2
0
19 Sep 2025
Small Models, Big Results: Achieving Superior Intent Extraction through Decomposition
Small Models, Big Results: Achieving Superior Intent Extraction through Decomposition
Danielle Cohen
Yoni Halpern
Noam Kahlon
Joel Oren
Omri Berkovitch
Sapir Caduri
Ido Dagan
Anatoly Efros
72
0
0
15 Sep 2025
Realistic Environmental Injection Attacks on GUI Agents
Realistic Environmental Injection Attacks on GUI Agents
Yitong Zhang
Ximo Li
L. Cai
Jia Li
LLMAGAAML
77
2
0
14 Sep 2025
Instruction Agent: Enhancing Agent with Expert Demonstration
Instruction Agent: Enhancing Agent with Expert Demonstration
Yinheng Li
Hailey Hultquist
Justin Wagle
K. Koishida
LLMAG
49
0
0
08 Sep 2025
UI-TARS-2 Technical Report: Advancing GUI Agent with Multi-Turn Reinforcement Learning
UI-TARS-2 Technical Report: Advancing GUI Agent with Multi-Turn Reinforcement Learning
Haoming Wang
Haoyang Zou
Huatong Song
J. Feng
Junjie Fang
...
Xianzheng Ma
Xiaojun Xiao
X. Y. Huang
Xinjie Chen
Yidi Du
LLMAG
189
34
0
02 Sep 2025
A Multimodal GUI Architecture for Interfacing with LLM-Based Conversational Assistants
A Multimodal GUI Architecture for Interfacing with LLM-Based Conversational Assistants
Hans G.W. van Dam
115
0
0
31 Aug 2025
SWIRL: A Staged Workflow for Interleaved Reinforcement Learning in Mobile GUI Control
SWIRL: A Staged Workflow for Interleaved Reinforcement Learning in Mobile GUI Control
Quanfeng Lu
Zhantao Ma
Shuai Zhong
Jin Wang
Dahai Yu
Michael K. Ng
Ping Luo
148
0
0
27 Aug 2025
Mobile-Agent-v3: Fundamental Agents for GUI Automation
Mobile-Agent-v3: Fundamental Agents for GUI Automation
Jiabo Ye
Xi Zhang
Haiyang Xu
Haowei Liu
Junyang Wang
...
Jitong Liao
Qi Zheng
Fei Huang
Jingren Zhou
Ming Yan
LLMAGLM&Ro
224
28
0
21 Aug 2025
Cybernaut: Towards Reliable Web Automation
Cybernaut: Towards Reliable Web Automation
Ankur Tomar
Hengyue Liang
Indranil Bhattacharya
Natalia Larios
Francesco Carbone
80
0
0
21 Aug 2025
UI-Venus Technical Report: Building High-performance UI Agents with RFT
UI-Venus Technical Report: Building High-performance UI Agents with RFT
Zhangxuan Gu
Zhengwen Zeng
Zhenyu Xu
Xingran Zhou
Shuheng Shen
...
Yuan Guo
Yong Deng
Zhenyu Guo
Liang Chen
Weiqiang Wang
LLMAGLM&Ro
255
14
0
14 Aug 2025
Quick on the Uptake: Eliciting Implicit Intents from Human Demonstrations for Personalized Mobile-Use Agents
Quick on the Uptake: Eliciting Implicit Intents from Human Demonstrations for Personalized Mobile-Use Agents
Zheng Wu
Heyuan Huang
Y. Yang
Yuanyi Song
Xingyu Lou
Weiwen Liu
Weinan Zhang
Jun Wang
Zhuosheng Zhang
100
4
0
12 Aug 2025
OS Agents: A Survey on MLLM-based Agents for General Computing Devices Use
OS Agents: A Survey on MLLM-based Agents for General Computing Devices Use
Xueyu Hu
Tao Xiong
Biao Yi
Zishu Wei
Ruixuan Xiao
...
Zhou Zhao
Hongxia Yang
Fan Wu
Shengyu Zhang
Fei Wu
LLMAGLM&RoAI4TS
166
28
0
06 Aug 2025
Uncertainty-Aware GUI Agent: Adaptive Perception through Component Recommendation and Human-in-the-Loop Refinement
Uncertainty-Aware GUI Agent: Adaptive Perception through Component Recommendation and Human-in-the-Loop Refinement
Chao Hao
Shuai Wang
Kaiwen Zhou
138
7
0
06 Aug 2025
NaviMaster: Learning a Unified Policy for GUI and Embodied Navigation Tasks
NaviMaster: Learning a Unified Policy for GUI and Embodied Navigation Tasks
Zhihao Luo
Wentao Yan abd Jingyu Gong
Min Wang
Zhizhong Zhang
Xuhong Wang
Yuan Xie
Xin Tan
Xin Tan
134
4
0
04 Aug 2025
Evaluation and Benchmarking of LLM Agents: A Survey
Evaluation and Benchmarking of LLM Agents: A Survey
Mahmoud Mohammadi
Yipeng Li
Jane Lo
Wendy Yip
LLMAGELM
172
24
0
29 Jul 2025
UI-AGILE: Advancing GUI Agents with Effective Reinforcement Learning and Precise Inference-Time Grounding
UI-AGILE: Advancing GUI Agents with Effective Reinforcement Learning and Precise Inference-Time Grounding
Shuquan Lian
Yuhang Wu
Jia Ma
Yifan Ding
Zihan Song
Bingqi Chen
Xiawu Zheng
Hui Li
LLMAG
536
10
0
29 Jul 2025
Why Do Open-Source LLMs Struggle with Data Analysis? A Systematic Empirical Study
Why Do Open-Source LLMs Struggle with Data Analysis? A Systematic Empirical Study
Yuqi Zhu
Yi Zhong
Jintian Zhang
Ziheng Zhang
Shuofei Qiao
Yujie Luo
Lun Du
Da Zheng
Ningyu Zhang
Huajun Chen
ELM
280
1
0
24 Jun 2025
LPO: Towards Accurate GUI Agent Interaction via Location Preference Optimization
LPO: Towards Accurate GUI Agent Interaction via Location Preference Optimization
Jiaqi Tang
Yu Xia
Yi-Feng Wu
Yuwei Hu
Yuhui Chen
...
Xiangyu Wu
Hao Lu
Yanqing Ma
Shiyin Lu
Qifeng Chen
249
8
0
11 Jun 2025
Atomic-to-Compositional Generalization for Mobile Agents with A New Benchmark and Scheduling System
Yuan Guo
Tingjia Miao
Zheng Wu
Pengzhou Cheng
Ming Zhou
Zhuosheng Zhang
150
6
0
10 Jun 2025
DeepShop: A Benchmark for Deep Research Shopping Agents
DeepShop: A Benchmark for Deep Research Shopping Agents
Yougang Lyu
Xiaoyu Zhang
Lingyong Yan
Maarten de Rijke
Zhaochun Ren
Xiuying Chen
265
12
0
03 Jun 2025
AgentCPM-GUI: Building Mobile-Use Agents with Reinforcement Fine-Tuning
AgentCPM-GUI: Building Mobile-Use Agents with Reinforcement Fine-Tuning
Zhong Zhang
Yaxi Lu
Yikun Fu
Yupeng Huo
Shenzhi Yang
...
Chongyi Wang
Chi Chen
Yuan Yao
Zhiyuan Liu
Maosong Sun
LLMAGALM
263
16
0
02 Jun 2025
Robot Operation of Home Appliances by Reading User Manuals
Robot Operation of Home Appliances by Reading User Manuals
Jian Zhang
Hanbo Zhang
Anxing Xiao
David Hsu
LM&Ro
255
1
0
26 May 2025
TransBench: Breaking Barriers for Transferable Graphical User Interface Agents in Dynamic Digital Environments
TransBench: Breaking Barriers for Transferable Graphical User Interface Agents in Dynamic Digital EnvironmentsAnnual Meeting of the Association for Computational Linguistics (ACL), 2025
Yuheng Lu
Qian Yu
Hongru Wang
Zeming Liu
Wei Su
Yanping Liu
Yuhang Guo
Maocheng Liang
Yunhong Wang
Haifeng Wang
LLMAG
411
1
0
23 May 2025
ReGUIDE: Data Efficient GUI Grounding via Spatial Reasoning and Search
ReGUIDE: Data Efficient GUI Grounding via Spatial Reasoning and Search
Hyunseok Lee
Jeonghoon Kim
Beomjun Kim
Jihoon Tack
Chansong Jo
Jaehong Lee
Cheonbok Park
Sookyo In
Jinwoo Shin
Kang Min Yoo
328
5
0
21 May 2025
Hidden Ghost Hand: Unveiling Backdoor Vulnerabilities in MLLM-Powered Mobile GUI Agents
Hidden Ghost Hand: Unveiling Backdoor Vulnerabilities in MLLM-Powered Mobile GUI Agents
Pengzhou Cheng
Haowen Hu
Zheng Wu
Zongru Wu
Tianjie Ju
Zhuosheng Zhang
Zhuosheng Zhang
LLMAGAAML
339
5
0
20 May 2025
Mobile-Agent-V: A Video-Guided Approach for Effortless and Efficient Operational Knowledge Injection in Mobile Automation
Mobile-Agent-V: A Video-Guided Approach for Effortless and Efficient Operational Knowledge Injection in Mobile Automation
Junyang Wang
Haiyang Xu
Xi Zhang
Ming Yan
Ji Zhang
Fei Huang
Jitao Sang
393
0
0
20 May 2025
From Assistants to Adversaries: Exploring the Security Risks of Mobile LLM Agents
From Assistants to Adversaries: Exploring the Security Risks of Mobile LLM Agents
Liangxuan Wu
Chao Wang
Tianming Liu
Yanjie Zhao
Haoyu Wang
AAML
387
9
0
19 May 2025
Leveraging Vision-Language Models for Visual Grounding and Analysis of Automotive UI
Leveraging Vision-Language Models for Visual Grounding and Analysis of Automotive UI
Benjamin Raphael Ernhofer
Daniil Prokhorov
Jannica Langner
Dominik Bollmann
241
1
0
09 May 2025
TongUI: Internet-Scale Trajectories from Multimodal Web Tutorials for Generalized GUI Agents
TongUI: Internet-Scale Trajectories from Multimodal Web Tutorials for Generalized GUI Agents
Bofei Zhang
Zirui Shang
Zhi Gao
Wang Zhang
Rui Xie
Xiaojian Ma
Tao Yuan
Xinxiao Wu
Song-Chun Zhu
Qing Li
LLMAG
308
21
0
17 Apr 2025
Towards Trustworthy GUI Agents: A Survey
Towards Trustworthy GUI Agents: A Survey
Yucheng Shi
Wenhao Yu
Wenlin Yao
Wenhu Chen
Ninghao Liu
209
16
0
30 Mar 2025
VeriSafe Agent: Safeguarding Mobile GUI Agent via Logic-based Action Verification
VeriSafe Agent: Safeguarding Mobile GUI Agent via Logic-based Action Verification
Jungjae Lee
Dongjae Lee
Chihun Choi
Youngmin Im
Jaeyoung Wi
Kihong Heo
Sangeun Oh
Sunjae Lee
Insik Shin
LLMAG
266
5
0
24 Mar 2025
Are AI Agents interacting with Online Ads?
Are AI Agents interacting with Online Ads?
Andreas Stöckl
Joel Nitu
376
2
0
20 Mar 2025
CHOP: Mobile Operating Assistant with Constrained High-frequency Optimized Subtask Planning
Yuqi Zhou
Shuai Wang
Sunhao Dai
Qinglin Jia
Zhaocheng Du
Zhenhua Dong
Jun Xu
LM&Ro
277
4
0
05 Mar 2025
VEM: Environment-Free Exploration for Training GUI Agent with Value Environment Model
VEM: Environment-Free Exploration for Training GUI Agent with Value Environment Model
Jiani Zheng
Lu Wang
Fangkai Yang
Chen Zhang
Shansong Liu
Wenjie Yin
Qingwei Lin
Dongmei Zhang
Saravan Rajmohan
Qi Zhang
OffRL
246
13
0
26 Feb 2025
Beyond In-Distribution Success: Scaling Curves of CoT Granularity for Language Model Generalization
Beyond In-Distribution Success: Scaling Curves of CoT Granularity for Language Model Generalization
Ru Wang
Wei Huang
Selena Song
Haoyu Zhang
Yusuke Iwasawa
Y. Matsuo
Jiaxian Guo
OODDLRM
332
4
0
25 Feb 2025
Mobile-Agent-E: Self-Evolving Mobile Assistant for Complex Tasks
Mobile-Agent-E: Self-Evolving Mobile Assistant for Complex Tasks
Zhenhailong Wang
Haiyang Xu
Junyang Wang
Xi Zhang
Ming Yan
Junxuan Zhang
Fei Huang
Heng Ji
389
68
0
20 Jan 2025
SPA-Bench: A Comprehensive Benchmark for SmartPhone Agent Evaluation
SPA-Bench: A Comprehensive Benchmark for SmartPhone Agent EvaluationInternational Conference on Learning Representations (ICLR), 2024
Jingxuan Chen
Derek Yuen
Bin Xie
Yue Yang
Gongwei Chen
...
Liqiang Nie
Yasheng Wang
Jianye Hao
Jun Wang
Youssef Attia El Hili
LLMAG
358
41
0
19 Oct 2024
12
Next