ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2410.15164
  4. Cited By
SPA-Bench: A Comprehensive Benchmark for SmartPhone Agent Evaluation
v1v2v3 (latest)

SPA-Bench: A Comprehensive Benchmark for SmartPhone Agent Evaluation

International Conference on Learning Representations (ICLR), 2024
19 October 2024
Jingxuan Chen
Derek Yuen
Bin Xie
Yue Yang
Gongwei Chen
Zhihao Wu
Li Yixing
Xurui Zhou
Weiwen Liu
Shuai Wang
Kaiwen Zhou
Rui Shao
Liqiang Nie
Yasheng Wang
Jianye Hao
Jun Wang
Youssef Attia El Hili
    LLMAG
ArXiv (abs)PDFHTML

Papers citing "SPA-Bench: A Comprehensive Benchmark for SmartPhone Agent Evaluation"

50 / 61 papers shown
Title
DigiData: Training and Evaluating General-Purpose Mobile Control Agents
DigiData: Training and Evaluating General-Purpose Mobile Control Agents
Yuxuan Sun
Manchen Wang
Shengyi Qian
William R. Wong
Eric Gan
...
Allen Bolourchi
James Valori
Kevin Carlberg
Karl Ridgeway
Joseph Tighe
213
0
0
10 Nov 2025
Measuring the Security of Mobile LLM Agents under Adversarial Prompts from Untrusted Third-Party Channels
Measuring the Security of Mobile LLM Agents under Adversarial Prompts from Untrusted Third-Party Channels
Chenghao Du
Quanfeng Huang
Tingxuan Tang
Zihao Wang
Adwait Nadkarni
Yue Xiao
AAML
193
0
0
31 Oct 2025
ComboBench: Can LLMs Manipulate Physical Devices to Play Virtual Reality Games?
ComboBench: Can LLMs Manipulate Physical Devices to Play Virtual Reality Games?
Shuqing Li
Jiayi Yan
Chenyu Niu
Jen-tse Huang
Yun Peng
Wenxuan Wang
Yepang Liu
Michael R. Lyu
57
0
0
28 Oct 2025
Hi-Agent: Hierarchical Vision-Language Agents for Mobile Device Control
Hi-Agent: Hierarchical Vision-Language Agents for Mobile Device Control
Zhe Wu
Hongjin Lu
Junliang Xing
C. Zhang
Yin Zhu
...
Kai Li
Kun Shao
Jianye Hao
Jun Wang
Yuanchun Shi
LM&Ro
76
0
0
16 Oct 2025
ColorBench: Benchmarking Mobile Agents with Graph-Structured Framework for Complex Long-Horizon Tasks
ColorBench: Benchmarking Mobile Agents with Graph-Structured Framework for Complex Long-Horizon Tasks
Yuanyi Song
Heyuan Huang
Qiqiang Lin
Yin Zhao
Xiangmou Qu
...
Zhuosheng Zhang
Jun Wang
Yong Yu
Weinan Zhang
Zhaoxiang Wang
LLMAGOffRL
100
1
0
16 Oct 2025
Say One Thing, Do Another? Diagnosing Reasoning-Execution Gaps in VLM-Powered Mobile-Use Agents
Say One Thing, Do Another? Diagnosing Reasoning-Execution Gaps in VLM-Powered Mobile-Use Agents
Lingzhong Dong
Ziqi Zhou
Shuaibo Yang
Haiyue Sheng
Pengzhou Cheng
Zongru Wu
Zheng Wu
Gongshen Liu
Zhuosheng Zhang
LRM
103
0
0
02 Oct 2025
Agent-ScanKit: Unraveling Memory and Reasoning of Multimodal Agents via Sensitivity Perturbations
Agent-ScanKit: Unraveling Memory and Reasoning of Multimodal Agents via Sensitivity Perturbations
Pengzhou Cheng
Lingzhong Dong
Zeng Wu
Zongru Wu
Zhuosheng Zhang
Chengwei Qin
Zhuosheng Zhang
Gongshen Liu
LLMAG
366
0
0
01 Oct 2025
MAS-Bench: A Unified Benchmark for Shortcut-Augmented Hybrid Mobile GUI Agents
MAS-Bench: A Unified Benchmark for Shortcut-Augmented Hybrid Mobile GUI Agents
P. Zhao
Guangyi Liu
Yaozhen Liang
Weiqing He
Z. Lu
...
Yaxuan Guo
Kexin Zhang
Hao Wang
Liang Liu
Yong Liu
LLMAG
52
1
0
08 Sep 2025
Mind the Third Eye! Benchmarking Privacy Awareness in MLLM-powered Smartphone Agents
Mind the Third Eye! Benchmarking Privacy Awareness in MLLM-powered Smartphone Agents
Zhixin Lin
Jungang Li
Shidong Pan
Yibo Shi
Yue Yao
Dongliang Xu
89
2
0
27 Aug 2025
PerPilot: Personalizing VLM-based Mobile Agents via Memory and Exploration
PerPilot: Personalizing VLM-based Mobile Agents via Memory and Exploration
Xin Wang
Zhiyao Cui
Hao Li
Ya Zeng
Chenxu Wang
...
Qiaosheng Zhang
Jinzhuo Liu
Siyue Ren
Shuyue Hu
Zhen Wang
40
1
0
25 Aug 2025
Large VLM-based Vision-Language-Action Models for Robotic Manipulation: A Survey
Large VLM-based Vision-Language-Action Models for Robotic Manipulation: A Survey
Rui Shao
W. Li
Lingsen Zhang
Renshan Zhang
Zhiyang Liu
Ran Chen
Liqiang Nie
LM&Ro
167
19
0
18 Aug 2025
FineState-Bench: A Comprehensive Benchmark for Fine-Grained State Control in GUI Agents
FineState-Bench: A Comprehensive Benchmark for Fine-Grained State Control in GUI Agents
Fengxian Ji
Jingpu Yang
Zirui Song
Yuanxi Wang
Zhexuan Cui
Yuke Li
Qian Jiang
Miao Fang
Xiuying Chen
LLMAG
76
0
0
12 Aug 2025
MVISU-Bench: Benchmarking Mobile Agents for Real-World Tasks by Multi-App, Vague, Interactive, Single-App and Unethical Instructions
MVISU-Bench: Benchmarking Mobile Agents for Real-World Tasks by Multi-App, Vague, Interactive, Single-App and Unethical Instructions
Zeyu Huang
Juyuan Wang
L. Chen
Boyi Xiao
Leng Cai
Yawen Zeng
Jin Xu
96
2
0
12 Aug 2025
OS Agents: A Survey on MLLM-based Agents for General Computing Devices Use
OS Agents: A Survey on MLLM-based Agents for General Computing Devices Use
Xueyu Hu
Tao Xiong
Biao Yi
Zishu Wei
Ruixuan Xiao
...
Zhou Zhao
Hongxia Yang
Fan Wu
Shengyu Zhang
Fei Wu
LLMAGLM&RoAI4TS
190
28
0
06 Aug 2025
Evaluation and Benchmarking of LLM Agents: A Survey
Evaluation and Benchmarking of LLM Agents: A Survey
Mahmoud Mohammadi
Yipeng Li
Jane Lo
Wendy Yip
LLMAGELM
176
24
0
29 Jul 2025
MapAgent: Trajectory-Constructed Memory-Augmented Planning for Mobile Task Automation
MapAgent: Trajectory-Constructed Memory-Augmented Planning for Mobile Task Automation
Yi Kong
Dianxi Shi
Guoli Yang
Zhang ke-di
Chenlin Huang
Xiaopeng Li
Songchang Jin
LLMAGLM&Ro
321
2
0
29 Jul 2025
Deep Research Agents: A Systematic Examination And Roadmap
Deep Research Agents: A Systematic Examination And Roadmap
Y. Huang
Yihao Chen
Haozheng Zhang
Kang Li
Huichi Zhou
...
Lifeng Shang
Songcen Xu
Jianye Hao
Youssef Attia El Hili
Jun Wang
LLMAG
226
38
0
22 Jun 2025
Atomic-to-Compositional Generalization for Mobile Agents with A New Benchmark and Scheduling System
Yuan Guo
Tingjia Miao
Zheng Wu
Pengzhou Cheng
Ming Zhou
Zhuosheng Zhang
154
6
0
10 Jun 2025
FingerTip 20K: A Benchmark for Proactive and Personalized Mobile LLM Agents
FingerTip 20K: A Benchmark for Proactive and Personalized Mobile LLM Agents
Qinglong Yang
Haoming Li
Haotian Zhao
Xiaokai Yan
Jingtao Ding
Fengli Xu
Yong Li
LLMAG
107
1
0
09 Jun 2025
Evolutionary Perspectives on the Evaluation of LLM-Based AI Agents: A Comprehensive Survey
Evolutionary Perspectives on the Evaluation of LLM-Based AI Agents: A Comprehensive Survey
Jiachen Zhu
Menghui Zhu
Renting Rui
Rong Shan
Congmin Zheng
...
Jianghao Lin
Weiwen Liu
Ruiming Tang
Yong Yu
Weinan Zhang
LLMAGELM
234
6
0
06 Jun 2025
FormFactory: An Interactive Benchmarking Suite for Multimodal Form-Filling Agents
FormFactory: An Interactive Benchmarking Suite for Multimodal Form-Filling Agents
B. Li
Yuheng Wang
Hao Fei
Juncheng Li
Wei Ji
Yang Deng
Wynne Hsu
161
1
0
02 Jun 2025
WebChoreArena: Evaluating Web Browsing Agents on Realistic Tedious Web Tasks
WebChoreArena: Evaluating Web Browsing Agents on Realistic Tedious Web Tasks
Atsuyuki Miyai
Zaiying Zhao
Kazuki Egashira
Atsuki Sato
Tatsumi Sunada
...
Mashiro Toyooka
Kunato Nishina
Ryoma Maeda
Kiyoharu Aizawa
Toshihiko Yamasaki
LLMAG
235
10
0
02 Jun 2025
Agent-SAMA: State-Aware Mobile Assistant
Agent-SAMA: State-Aware Mobile Assistant
Linqiang Guo
Wei Liu
Yi Wen Heng
Tse-Hsun
Chen Chen
Yang Wang
LLMAG
266
0
0
29 May 2025
XBOUND: Exploring Capability Boundaries of Device-Control Agents at the State Level
XBOUND: Exploring Capability Boundaries of Device-Control Agents at the State Level
Shaoqing Zhang
Kehai Chen
Zhuosheng Zhang
Rumei Li
Rongxiang Weng
Yang Xiang
Liqiang Nie
281
0
0
27 May 2025
GUI-explorer: Autonomous Exploration and Mining of Transition-aware Knowledge for GUI Agent
GUI-explorer: Autonomous Exploration and Mining of Transition-aware Knowledge for GUI AgentAnnual Meeting of the Association for Computational Linguistics (ACL), 2025
Bin Xie
Rui Shao
Gongwei Chen
Kaiwen Zhou
Yinchuan Li
Jie Liu
Min Zhang
Liqiang Nie
LLMAG
221
13
0
22 May 2025
ViMo: A Generative Visual GUI World Model for App Agents
ViMo: A Generative Visual GUI World Model for App Agents
Dezhao Luo
Bohan Tang
Kang Li
Georgios Papoudakis
Jifei Song
S. Gong
Haifeng Zhang
Jun Wang
Cheng Deng
LM&RoVGen
458
2
0
15 Apr 2025
LION-FS: Fast & Slow Video-Language Thinker as Online Video AssistantComputer Vision and Pattern Recognition (CVPR), 2025
Wei Li
Bing Hu
Rui Shao
Leyang Shen
Liqiang Nie
223
30
0
05 Mar 2025
DistRL: An Asynchronous Distributed Reinforcement Learning Framework for On-Device Control Agents
DistRL: An Asynchronous Distributed Reinforcement Learning Framework for On-Device Control AgentsInternational Conference on Learning Representations (ICLR), 2024
Taiyi Wang
Zhihao Wu
Jianheng Liu
Jianye Hao
Jun Wang
Youssef Attia El Hili
OffRL
405
48
0
24 Feb 2025
AppVLM: A Lightweight Vision Language Model for Online App Control
AppVLM: A Lightweight Vision Language Model for Online App Control
Georgios Papoudakis
Thomas Coste
Zhihao Wu
Jianye Hao
Jun Wang
Youssef Attia El Hili
223
13
0
10 Feb 2025
GUI Agents with Foundation Models: A Comprehensive Survey
GUI Agents with Foundation Models: A Comprehensive Survey
Shuai Wang
Wen Liu
Jingxuan Chen
Weinan Gan
Xingshan Zeng
...
Bin Wang
Chuhan Wu
Yasheng Wang
Ruiming Tang
Jianye Hao
LLMAG
351
67
0
07 Nov 2024
Foundations and Recent Trends in Multimodal Mobile Agents: A Survey
Foundations and Recent Trends in Multimodal Mobile Agents: A Survey
Biao Wu
Yanda Li
Meng Fang
Zirui Song
Zhiwei Zhang
Yunchao Wei
LM&RoLLMAGOffRLAI4TS
299
18
0
04 Nov 2024
Lightweight Neural App Control
Lightweight Neural App ControlInternational Conference on Learning Representations (ICLR), 2024
Filippos Christianos
Georgios Papoudakis
Thomas Coste
Jianye Hao
Jun Wang
Youssef Attia El Hili
LM&Ro
209
9
0
23 Oct 2024
AgentSquare: Automatic LLM Agent Search in Modular Design Space
AgentSquare: Automatic LLM Agent Search in Modular Design SpaceInternational Conference on Learning Representations (ICLR), 2024
Yu Shang
Yu Li
Keyu Zhao
Likai Ma
Qingbin Liu
Fengli Xu
Yong Li
LLMAG
371
47
0
08 Oct 2024
DigiRL: Training In-The-Wild Device-Control Agents with Autonomous
  Reinforcement Learning
DigiRL: Training In-The-Wild Device-Control Agents with Autonomous Reinforcement LearningNeural Information Processing Systems (NeurIPS), 2024
Hao Bai
Yifei Zhou
Mert Cemri
Jiayi Pan
Alane Suhr
Sergey Levine
Aviral Kumar
OffRL
276
121
0
14 Jun 2024
MobileAgentBench: An Efficient and User-Friendly Benchmark for Mobile
  LLM Agents
MobileAgentBench: An Efficient and User-Friendly Benchmark for Mobile LLM Agents
Luyuan Wang
Yongyu Deng
Yiwei Zha
Guodong Mao
Qinmin Wang
Tianchen Min
Wei Chen
Shoufa Chen
LLMAG
143
43
0
12 Jun 2024
GUIOdyssey: A Comprehensive Dataset for Cross-App GUI Navigation on Mobile Devices
GUIOdyssey: A Comprehensive Dataset for Cross-App GUI Navigation on Mobile Devices
Quanfeng Lu
Wenqi Shao
Zitao Liu
Lingxiao Du
Fanqing Meng
Boxuan Li
Botong Chen
Siyuan Huang
Kaipeng Zhang
Ping Luo
273
86
0
12 Jun 2024
Mobile-Agent-v2: Mobile Device Operation Assistant with Effective
  Navigation via Multi-Agent Collaboration
Mobile-Agent-v2: Mobile Device Operation Assistant with Effective Navigation via Multi-Agent Collaboration
Junyang Wang
Haiyang Xu
Haitao Jia
Xi Zhang
Ming Yan
Weizhou Shen
Ji Zhang
Fei Huang
Jitao Sang
LM&RoLLMAG
274
133
0
03 Jun 2024
AndroidWorld: A Dynamic Benchmarking Environment for Autonomous Agents
AndroidWorld: A Dynamic Benchmarking Environment for Autonomous AgentsInternational Conference on Learning Representations (ICLR), 2024
Christopher Rawles
Sarah Clinckemaillie
Yifan Chang
Jonathan Waltz
Gabrielle Lau
...
Daniel Toyama
Robert Berry
Divya Tyamagundlu
Timothy Lillicrap
Oriana Riva
LLMAG
400
158
0
23 May 2024
Benchmarking Mobile Device Control Agents across Diverse Configurations
Benchmarking Mobile Device Control Agents across Diverse Configurations
Juyong Lee
Taywon Min
Minyong An
Dongyoon Hahm
Kimin Lee
Changyeon Kim
Kimin Lee
268
29
0
25 Apr 2024
LlamaTouch: A Faithful and Scalable Testbed for Mobile UI Automation
  Task Evaluation
LlamaTouch: A Faithful and Scalable Testbed for Mobile UI Automation Task Evaluation
Li Zhang
Shihe Wang
Xianqing Jia
Zhihan Zheng
Yun-Yu Yan
Longxi Gao
Yuanchun Li
Mengwei Xu
LLMAG
164
0
0
12 Apr 2024
Autonomous Evaluation and Refinement of Digital Agents
Autonomous Evaluation and Refinement of Digital Agents
Jiayi Pan
Yichi Zhang
Nicholas Tomlin
Yifei Zhou
Sergey Levine
Alane Suhr
ELM
430
93
0
09 Apr 2024
AutoWebGLM: Bootstrap And Reinforce A Large Language Model-based Web
  Navigating Agent
AutoWebGLM: Bootstrap And Reinforce A Large Language Model-based Web Navigating AgentKnowledge Discovery and Data Mining (KDD), 2024
Hanyu Lai
Xiao Liu
Iat Long Iong
Shuntian Yao
Yuxuan Chen
...
Hao Yu
Hanchen Zhang
Xiaohan Zhang
Yuxiao Dong
Jie Tang
LM&RoLLMAG
143
19
0
04 Apr 2024
Large Multimodal Agents: A Survey
Large Multimodal Agents: A Survey
Junlin Xie
Zhihong Chen
Ruifei Zhang
Xiang Wan
Guanbin Li
LM&RoLLMAG
192
77
0
23 Feb 2024
Understanding the Weakness of Large Language Model Agents within a
  Complex Android Environment
Understanding the Weakness of Large Language Model Agents within a Complex Android EnvironmentKnowledge Discovery and Data Mining (KDD), 2024
Mingzhe Xing
Rongkai Zhang
Hui Xue
Qi Chen
Fan Yang
Zhengjin Xiao
LLMAGELMAAML
168
48
0
09 Feb 2024
Mobile-Agent: Autonomous Multi-Modal Mobile Device Agent with Visual
  Perception
Mobile-Agent: Autonomous Multi-Modal Mobile Device Agent with Visual Perception
Junyang Wang
Haiyang Xu
Jiabo Ye
Mingshi Yan
Weizhou Shen
Ji Zhang
Fei Huang
Jitao Sang
292
205
0
29 Jan 2024
GPT-4V(ision) is a Generalist Web Agent, if Grounded
GPT-4V(ision) is a Generalist Web Agent, if GroundedInternational Conference on Machine Learning (ICML), 2024
Boyuan Zheng
Boyu Gou
Jihyung Kil
Huan Sun
Yu-Chuan Su
MLLMVLMLLMAG
298
385
0
03 Jan 2024
Set-of-Mark Prompting Unleashes Extraordinary Visual Grounding in GPT-4V
Set-of-Mark Prompting Unleashes Extraordinary Visual Grounding in GPT-4V
Jianwei Yang
Hao Zhang
Feng Li
Xueyan Zou
Chun-yue Li
Jianfeng Gao
MLLMVLM
328
268
0
17 Oct 2023
A Zero-Shot Language Agent for Computer Control with Structured
  Reflection
A Zero-Shot Language Agent for Computer Control with Structured ReflectionConference on Empirical Methods in Natural Language Processing (EMNLP), 2023
Tao Li
Gang Li
Zhiwei Deng
Bryan Wang
Yang Li
LM&RoLLMAG
321
28
0
12 Oct 2023
SmartPlay: A Benchmark for LLMs as Intelligent Agents
SmartPlay: A Benchmark for LLMs as Intelligent AgentsInternational Conference on Learning Representations (ICLR), 2023
Yue Wu
Xuan Tang
Tom Michael Mitchell
Yuanzhi Li
ELMLLMAG
445
103
0
02 Oct 2023
ToRA: A Tool-Integrated Reasoning Agent for Mathematical Problem Solving
ToRA: A Tool-Integrated Reasoning Agent for Mathematical Problem SolvingInternational Conference on Learning Representations (ICLR), 2023
Zhibin Gou
Zhihong Shao
Yeyun Gong
Haoran Pan
Yujiu Yang
Shiyu Huang
Nan Duan
Weizhu Chen
LRMAI4CELLMAG
302
247
0
29 Sep 2023
12
Next