ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2402.06596
  4. Cited By
Understanding the Weakness of Large Language Model Agents within a
  Complex Android Environment

Understanding the Weakness of Large Language Model Agents within a Complex Android Environment

9 February 2024
Mingzhe Xing
Rongkai Zhang
Hui Xue
Qi Chen
Fan Yang
Zhengjin Xiao
    LLMAG
    ELM
    AAML
ArXivPDFHTML

Papers citing "Understanding the Weakness of Large Language Model Agents within a Complex Android Environment"

19 / 19 papers shown
Title
AndroidGen: Building an Android Language Agent under Data Scarcity
AndroidGen: Building an Android Language Agent under Data Scarcity
Hanyu Lai
Junjie Gao
Xiao-Yang Liu
Y. Xu
S. Zhang
Yuxiao Dong
Jie Tang
LLMAG
72
0
0
27 Apr 2025
DioR: Adaptive Cognitive Detection and Contextual Retrieval Optimization for Dynamic Retrieval-Augmented Generation
DioR: Adaptive Cognitive Detection and Contextual Retrieval Optimization for Dynamic Retrieval-Augmented Generation
Hanghui Guo
Jia Zhu
Shimin Di
Weijie Shi
Zhangze Chen
Jiajie Xu
28
0
0
14 Apr 2025
AgentSpec: Customizable Runtime Enforcement for Safe and Reliable LLM Agents
AgentSpec: Customizable Runtime Enforcement for Safe and Reliable LLM Agents
Haoyu Wang
Christopher M. Poskitt
Jun Sun
37
0
0
24 Mar 2025
Safeguarding Mobile GUI Agent via Logic-based Action Verification
Safeguarding Mobile GUI Agent via Logic-based Action Verification
Jungjae Lee
Dongjae Lee
Chihun Choi
Youngmin Im
Jaeyoung Wi
Kihong Heo
Sangeun Oh
Sunjae Lee
Insik Shin
LLMAG
75
0
0
24 Mar 2025
Advancing Mobile GUI Agents: A Verifier-Driven Approach to Practical Deployment
Advancing Mobile GUI Agents: A Verifier-Driven Approach to Practical Deployment
Gaole Dai
Shiqi Jiang
Ting Cao
Yuanchun Li
Y. Yang
Rui Tan
Mo Li
Lili Qiu
46
0
0
20 Mar 2025
Factorio Learning Environment
Jack Hopkins
Mart Bakler
Akbir Khan
LRM
AI4CE
LLMAG
50
0
0
06 Mar 2025
AutoEval: A Practical Framework for Autonomous Evaluation of Mobile Agents
Jiahui Sun
Zhichao Hua
Yubin Xia
45
0
0
04 Mar 2025
Harnessing Language for Coordination: A Framework and Benchmark for LLM-Driven Multi-Agent Control
Harnessing Language for Coordination: A Framework and Benchmark for LLM-Driven Multi-Agent Control
Timothée Anne
Noah Syrkis
Meriem Elhosni
Florian Turati
Franck Legendre
Alain Jaquier
Sebastian Risi
LLMAG
90
2
0
16 Dec 2024
BALROG: Benchmarking Agentic LLM and VLM Reasoning On Games
BALROG: Benchmarking Agentic LLM and VLM Reasoning On Games
Davide Paglieri
Bartłomiej Cupiał
Samuel Coward
Ulyana Piterbarg
Maciej Wolczyk
...
Lerrel Pinto
Rob Fergus
Jakob Foerster
Jack Parker-Holder
Tim Rocktaschel
LLMAG
LRM
101
10
0
20 Nov 2024
Foundations and Recent Trends in Multimodal Mobile Agents: A Survey
Foundations and Recent Trends in Multimodal Mobile Agents: A Survey
Biao Wu
Yanda Li
Meng Fang
Zirui Song
Zhiwei Zhang
Yunchao Wei
L. Chen
LM&Ro
LLMAG
OffRL
AI4TS
39
3
0
04 Nov 2024
AndroidLab: Training and Systematic Benchmarking of Android Autonomous
  Agents
AndroidLab: Training and Systematic Benchmarking of Android Autonomous Agents
Yifan Xu
Xiao Liu
X. Sun
Siyi Cheng
Hao Yu
Hanyu Lai
Shudan Zhang
Dan Zhang
Jie Tang
Yuxiao Dong
LLMAG
44
7
0
31 Oct 2024
SPA-Bench: A Comprehensive Benchmark for SmartPhone Agent Evaluation
SPA-Bench: A Comprehensive Benchmark for SmartPhone Agent Evaluation
Jingxuan Chen
Derek Yuen
Bin Xie
Y. Yang
Gongwei Chen
...
Liqiang Nie
Yasheng Wang
Jianye Hao
Jun Wang
Kun Shao
LLMAG
35
5
0
19 Oct 2024
MobileViews: A Large-Scale Mobile GUI Dataset
MobileViews: A Large-Scale Mobile GUI Dataset
Longxi Gao
Li Zhang
Shihe Wang
Shangguang Wang
Yuanchun Li
Mengwei Xu
28
5
0
22 Sep 2024
CRAB: Cross-environment Agent Benchmark for Multimodal Language Model
  Agents
CRAB: Cross-environment Agent Benchmark for Multimodal Language Model Agents
Tianqi Xu
Linyao Chen
Dai-Jie Wu
Yanjun Chen
Zecheng Zhang
...
Shilong Liu
Bochen Qian
Philip H. S. Torr
Bernard Ghanem
G. Li
38
14
0
01 Jul 2024
MobileAgentBench: An Efficient and User-Friendly Benchmark for Mobile
  LLM Agents
MobileAgentBench: An Efficient and User-Friendly Benchmark for Mobile LLM Agents
Luyuan Wang
Yongyu Deng
Yiwei Zha
Guodong Mao
Qinmin Wang
Tianchen Min
Wei Chen
Shoufa Chen
LLMAG
40
12
0
12 Jun 2024
AndroidWorld: A Dynamic Benchmarking Environment for Autonomous Agents
AndroidWorld: A Dynamic Benchmarking Environment for Autonomous Agents
Christopher Rawles
Sarah Clinckemaillie
Yifan Chang
Jonathan Waltz
Gabrielle Lau
...
Daniel Toyama
Robert Berry
Divya Tyamagundlu
Timothy Lillicrap
Oriana Riva
LLMAG
57
44
0
23 May 2024
LlamaTouch: A Faithful and Scalable Testbed for Mobile UI Automation
  Task Evaluation
LlamaTouch: A Faithful and Scalable Testbed for Mobile UI Automation Task Evaluation
Li Lyna Zhang
Shihe Wang
Xianqing Jia
Zhihan Zheng
Yun-Yu Yan
Longxi Gao
Yuanchun Li
Mengwei Xu
LLMAG
17
10
0
12 Apr 2024
Mobile-Env: Building Qualified Evaluation Benchmarks for LLM-GUI
  Interaction
Mobile-Env: Building Qualified Evaluation Benchmarks for LLM-GUI Interaction
Danyang Zhang
Zhennan Shen
Rui Xie
Situo Zhang
Tianbao Xie
...
Siyuan Chen
Lu Chen
Hongshen Xu
Ruisheng Cao
Kai Yu
ELM
LLMAG
26
3
0
14 May 2023
ReAct: Synergizing Reasoning and Acting in Language Models
ReAct: Synergizing Reasoning and Acting in Language Models
Shunyu Yao
Jeffrey Zhao
Dian Yu
Nan Du
Izhak Shafran
Karthik Narasimhan
Yuan Cao
LLMAG
ReLM
LRM
223
2,413
0
06 Oct 2022
1