ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2410.15164
  4. Cited By
SPA-Bench: A Comprehensive Benchmark for SmartPhone Agent Evaluation
v1v2v3 (latest)

SPA-Bench: A Comprehensive Benchmark for SmartPhone Agent Evaluation

International Conference on Learning Representations (ICLR), 2024
19 October 2024
Jingxuan Chen
Derek Yuen
Bin Xie
Yue Yang
Gongwei Chen
Zhihao Wu
Li Yixing
Xurui Zhou
Weiwen Liu
Shuai Wang
Kaiwen Zhou
Rui Shao
Liqiang Nie
Yasheng Wang
Jianye Hao
Jun Wang
Youssef Attia El Hili
    LLMAG
ArXiv (abs)PDFHTML

Papers citing "SPA-Bench: A Comprehensive Benchmark for SmartPhone Agent Evaluation"

13 / 63 papers shown
SmartPlay: A Benchmark for LLMs as Intelligent Agents
SmartPlay: A Benchmark for LLMs as Intelligent AgentsInternational Conference on Learning Representations (ICLR), 2023
Yue Wu
Xuan Tang
Tom Michael Mitchell
Yuanzhi Li
ELMLLMAG
545
105
0
02 Oct 2023
ToRA: A Tool-Integrated Reasoning Agent for Mathematical Problem Solving
ToRA: A Tool-Integrated Reasoning Agent for Mathematical Problem SolvingInternational Conference on Learning Representations (ICLR), 2023
Zhibin Gou
Zhihong Shao
Yeyun Gong
Haoran Pan
Yujiu Yang
Shiyu Huang
Nan Duan
Weizhu Chen
LRMAI4CELLMAG
411
258
0
29 Sep 2023
You Only Look at Screens: Multimodal Chain-of-Action Agents
You Only Look at Screens: Multimodal Chain-of-Action AgentsAnnual Meeting of the Association for Computational Linguistics (ACL), 2023
Zhuosheng Zhang
Aston Zhang
LLMAGLM&Ro
466
163
0
20 Sep 2023
AutoDroid: LLM-powered Task Automation in Android
AutoDroid: LLM-powered Task Automation in AndroidACM/IEEE International Conference on Mobile Computing and Networking (MobiCom), 2023
Hao Wen
Yuanchun Li
Guohong Liu
Shanhui Zhao
Tao Yu
Toby Jia-Jun Li
Shiqi Jiang
Yunhao Liu
Yaqin Zhang
Yunxin Liu
408
179
0
29 Aug 2023
ChatEval: Towards Better LLM-based Evaluators through Multi-Agent Debate
ChatEval: Towards Better LLM-based Evaluators through Multi-Agent Debate
Chi-Min Chan
Weize Chen
Yusheng Su
Jianxuan Yu
Wei Xue
Shan Zhang
Jie Fu
Zhiyuan Liu
ELMLLMAGALM
261
725
0
14 Aug 2023
Large Language Models for Information Retrieval: A Survey
Large Language Models for Information Retrieval: A Survey
Yutao Zhu
Huaying Yuan
Shuting Wang
Jiongnan Liu
Wenhan Liu
Chenlong Deng
Haonan Chen
Zheng Liu
Zhicheng Dou
Ji-Rong Wen
KELM
621
452
0
14 Aug 2023
BOLAA: Benchmarking and Orchestrating LLM-augmented Autonomous Agents
BOLAA: Benchmarking and Orchestrating LLM-augmented Autonomous Agents
Zhiwei Liu
Weiran Yao
Jianguo Zhang
Le Xue
Shelby Heinecke
...
Ran Xu
P. Mùi
Haiquan Wang
Caiming Xiong
Silvio Savarese
LLMAG
220
100
0
11 Aug 2023
AgentBench: Evaluating LLMs as Agents
AgentBench: Evaluating LLMs as AgentsInternational Conference on Learning Representations (ICLR), 2023
Xiao-Yang Liu
Hao Yu
Hanchen Zhang
Yifan Xu
Xuanyu Lei
...
Yu-Chuan Su
Huan Sun
Shiyu Huang
Yuxiao Dong
Jie Tang
ELMLLMAG
524
494
0
07 Aug 2023
A Real-World WebAgent with Planning, Long Context Understanding, and
  Program Synthesis
A Real-World WebAgent with Planning, Long Context Understanding, and Program SynthesisInternational Conference on Learning Representations (ICLR), 2023
Izzeddin Gur
Hiroki Furuta
Austin Huang
Mustafa Safdari
Yutaka Matsuo
Douglas Eck
Aleksandra Faust
LM&RoLLMAG
566
309
0
24 Jul 2023
Android in the Wild: A Large-Scale Dataset for Android Device Control
Android in the Wild: A Large-Scale Dataset for Android Device ControlNeural Information Processing Systems (NeurIPS), 2023
Christopher Rawles
Alice Li
Daniel Rodriguez
Oriana Riva
Timothy Lillicrap
LM&Ro
396
250
0
19 Jul 2023
Large Language Models as Tool Makers
Large Language Models as Tool MakersInternational Conference on Learning Representations (ICLR), 2023
Tianle Cai
Xuezhi Wang
Tengyu Ma
Xinyun Chen
Denny Zhou
LLMAG
276
259
0
26 May 2023
Voyager: An Open-Ended Embodied Agent with Large Language Models
Voyager: An Open-Ended Embodied Agent with Large Language Models
Guanzhi Wang
Yuqi Xie
Yunfan Jiang
Ajay Mandlekar
Chaowei Xiao
Yuke Zhu
Linxi Fan
Anima Anandkumar
LM&RoSyDa
469
1,177
0
25 May 2023
CAMEL: Communicative Agents for "Mind" Exploration of Large Language
  Model Society
CAMEL: Communicative Agents for "Mind" Exploration of Large Language Model SocietyNeural Information Processing Systems (NeurIPS), 2023
Ge Li
Hasan Hammoud
Hani Itani
Dmitrii Khizbullin
Guohao Li
SyDaALM
573
940
0
31 Mar 2023
Previous
12