v1v2v3 (latest)

SPA-Bench: A Comprehensive Benchmark for SmartPhone Agent Evaluation

International Conference on Learning Representations (ICLR), 2024

19 October 2024

Weiwen Liu

Rui Shao

Yasheng Wang

Jun Wang

Youssef Attia El Hili

LLMAG

ArXiv (abs)PDF HTML

Papers citing "SPA-Bench: A Comprehensive Benchmark for SmartPhone Agent Evaluation"

13 / 63 papers shown

SmartPlay: A Benchmark for LLMs as Intelligent AgentsInternational Conference on Learning Representations (ICLR), 2023

545

105

02 Oct 2023

ToRA: A Tool-Integrated Reasoning Agent for Mathematical Problem SolvingInternational Conference on Learning Representations (ICLR), 2023

Zhihong Shao

Yujiu Yang

411

258

29 Sep 2023

You Only Look at Screens: Multimodal Chain-of-Action AgentsAnnual Meeting of the Association for Computational Linguistics (ACL), 2023

Zhuosheng Zhang

Aston Zhang

LLMAG LM&Ro

466

163

20 Sep 2023

AutoDroid: LLM-powered Task Automation in AndroidACM/IEEE International Conference on Mobile Computing and Networking (MobiCom), 2023

408

179

29 Aug 2023

ChatEval: Towards Better LLM-based Evaluators through Multi-Agent Debate

Chi-Min Chan

Weize Chen

Yusheng Su

Jianxuan Yu

Wei Xue

Shan Zhang

Jie Fu

Zhiyuan Liu

ELM LLMAG ALM

261

725

14 Aug 2023

Large Language Models for Information Retrieval: A Survey

621

452

14 Aug 2023

BOLAA: Benchmarking and Orchestrating LLM-augmented Autonomous Agents

...

Ran Xu

Silvio Savarese

220

100

11 Aug 2023

AgentBench: Evaluating LLMs as AgentsInternational Conference on Learning Representations (ICLR), 2023

...

524

494

07 Aug 2023

A Real-World WebAgent with Planning, Long Context Understanding, and Program SynthesisInternational Conference on Learning Representations (ICLR), 2023

Hiroki Furuta

566

309

24 Jul 2023

Android in the Wild: A Large-Scale Dataset for Android Device ControlNeural Information Processing Systems (NeurIPS), 2023

396

250

19 Jul 2023

Large Language Models as Tool MakersInternational Conference on Learning Representations (ICLR), 2023

Tianle Cai

276

259

26 May 2023

Voyager: An Open-Ended Embodied Agent with Large Language Models

Linxi Fan

469

1,177

25 May 2023

CAMEL: Communicative Agents for "Mind" Exploration of Large Language Model SocietyNeural Information Processing Systems (NeurIPS), 2023

573

940

31 Mar 2023