Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
2403.02713
Cited By
v1
v2 (latest)
Android in the Zoo: Chain-of-Action-Thought for GUI Agents
5 March 2024
Jiwen Zhang
Jihao Wu
Yihua Teng
Minghui Liao
Nuo Xu
Xiao Xiao
Zhongyu Wei
Duyu Tang
LLMAG
LM&Ro
Re-assign community
ArXiv (abs)
PDF
HTML
Github (101★)
Papers citing
"Android in the Zoo: Chain-of-Action-Thought for GUI Agents"
50 / 85 papers shown
Reinforcement Learning for Large Model: A Survey
Weijia Wu
Chen Gao
Joya Chen
Kevin Lin
Qingwei Meng
Yiming Zhang
Yuke Qiu
Hong Zhou
Mike Zheng Shou
420
2
0
24 Dec 2025
Training High-Level Schedulers with Execution-Feedback Reinforcement Learning for Long-Horizon GUI Automation
Zehao Deng
Tianjie Ju
Zheng Wu
Zhuosheng Zhang
Gongshen Liu
OffRL
120
0
0
27 Nov 2025
A Variance-Based Analysis of Sample Complexity for Grid Coverage
Lyu Yuhuan
244
5
0
21 Nov 2025
AUTO-Explorer: Automated Data Collection for GUI Agent
Xiangwu Guo
Difei Gao
Mike Zheng Shou
LLMAG
213
2
0
09 Nov 2025
SCoPE VLM: Selective Context Processing for Efficient Document Navigation in Vision-Language Models
Gyubeum Lim
Yemo Koo
Vijay Krishna Madisetti
162
0
0
22 Oct 2025
ColorAgent: Building A Robust, Personalized, and Interactive OS Agent
Ning Li
Qiqiang Lin
Zheng Wu
Xiaoyun Mo
Weiming Zhang
...
Xingyu Lou
Jun Wang
Weiwen Liu
Zhuosheng Zhang
Weinan Zhang
LLMAG
VLM
230
3
0
22 Oct 2025
MetaCaptioner: Towards Generalist Visual Captioning with Open-source Suites
Zhenxin Lei
Zhangwei Gao
Changyao Tian
Erfei Cui
Guanzhou Chen
...
Xiangyu Zhao
Jiayi Ji
Yu Qiao
Wenhai Wang
Gen Luo
VLM
316
0
0
14 Oct 2025
Agent-ScanKit: Unraveling Memory and Reasoning of Multimodal Agents via Sensitivity Perturbations
Pengzhou Cheng
Lingzhong Dong
Zeng Wu
Zongru Wu
Zhuosheng Zhang
Chengwei Qin
Zhuosheng Zhang
Gongshen Liu
LLMAG
446
2
0
01 Oct 2025
GUI-Shepherd: Reliable Process Reward and Verification for Long-Sequence GUI Tasks
Cong Chen
Kaixiang Ji
Hao Zhong
Huanyi Zheng
Anzhou Li
...
Cheng Zou
Jiajia Liu
Jingdong Chen
Hao Chen
Chunhua Shen
ALM
197
2
0
28 Sep 2025
Robust, Observable, and Evolvable Agentic Systems Engineering: A Principled Framework Validated via the Fairy GUI Agent
Jiazheng Sun
Te Yang
Xu Han
Jiayang Niu
Mingxuan Li
Ruimeng Yang
Yongyong Lu
Xin Peng
LLMAG
188
0
0
25 Sep 2025
UIPro: Unleashing Superior Interaction Capability For GUI Agents
Hongxin Li
Jingran Su
Jingfan Chen
Zheng Ju
Yuntao Chen
Qing Li
Zhaoxiang Zhang
LLMAG
342
0
0
22 Sep 2025
GUI-ReWalk: Massive Data Generation for GUI Agent via Stochastic Exploration and Intent-Aware Reasoning
Musen Lin
M. Liu
Taoran Lu
L. Yuan
Yiwei Liu
Haonan Xu
Yu Miao
Yuhao Chao
Ruoyao Xiao
231
3
0
19 Sep 2025
See, Think, Act: Teaching Multimodal Agents to Effectively Interact with GUI by Identifying Toggles
Zongru Wu
Rui Mao
Zhiyuan Tian
Pengzhou Cheng
Tianjie Ju
Zheng Wu
Lingzhong Dong
Haiyue Sheng
Zhuosheng Zhang
Gongshen Liu
193
1
0
17 Sep 2025
OmniActor: A Generalist GUI and Embodied Agent for 2D&3D Worlds
Longrong Yang
Zhixiong Zeng
Yufeng Zhong
Jing Huang
Liming Zheng
Lei Chen
Haibo Qiu
Zequn Qin
Lin Ma
Xi Li
LLMAG
LM&Ro
217
4
0
02 Sep 2025
UItron: Foundational GUI Agent with Advanced Perception and Planning
Zhixiong Zeng
Jing Huang
Liming Zheng
Wenkang Han
Yufeng Zhong
Lei Chen
Longrong Yang
Yingjie Chu
Yuzhi He
Lin Ma
LLMAG
242
14
0
29 Aug 2025
SWIRL: A Staged Workflow for Interleaved Reinforcement Learning in Mobile GUI Control
Quanfeng Lu
Zhantao Ma
Shuai Zhong
Jin Wang
Dahai Yu
Michael K. Ng
Ping Luo
297
0
0
27 Aug 2025
Structuring GUI Elements through Vision Language Models: Towards Action Space Generation
Yi Xu
Yesheng Zhang
Jiajia Liu
Jingdong Chen
215
0
0
22 Aug 2025
UI-Venus Technical Report: Building High-performance UI Agents with RFT
Zhangxuan Gu
Zhengwen Zeng
Zhenyu Xu
Xingran Zhou
Shuheng Shen
...
Yuan Guo
Yong Deng
Zhenyu Guo
Liang Chen
Weiqiang Wang
LLMAG
LM&Ro
499
39
0
14 Aug 2025
MVISU-Bench: Benchmarking Mobile Agents for Real-World Tasks by Multi-App, Vague, Interactive, Single-App and Unethical Instructions
Zeyu Huang
Juyuan Wang
L. Chen
Boyi Xiao
Leng Cai
Yawen Zeng
Jin Xu
241
4
0
12 Aug 2025
OpenCUA: Open Foundations for Computer-Use Agents
Xinyuan Wang
Bowen Wang
Dunjie Lu
Junlin Yang
Tianbao Xie
...
Victor Zhong
Flood Sung
Y.Charles
Zhilin Yang
Tao Yu
ELM
VLM
351
55
0
12 Aug 2025
Evolving in Tasks: Empowering the Multi-modality Large Language Model as the Computer Use Agent
Liang Tang
Shuxian Li
Yuhao Cheng
Yukang Huo
Zhepeng Wang
Yiqiang Yan
Kaer Huang
Yanzhe Jing
330
6
0
06 Aug 2025
SEAgent: Self-Evolving Computer Use Agent with Autonomous Learning from Experience
Zeyi Sun
Ziyu Liu
Yuhang Zang
Yuhang Cao
Xiaoyi Dong
Tong Wu
Dahua Lin
Yuan Liu
LLMAG
296
29
0
06 Aug 2025
OS Agents: A Survey on MLLM-based Agents for General Computing Devices Use
Xueyu Hu
Tao Xiong
Biao Yi
Zishu Wei
Ruixuan Xiao
...
Zhou Zhao
Hongxia Yang
Fan Wu
Shengyu Zhang
Fei Wu
LLMAG
LM&Ro
AI4TS
382
43
0
06 Aug 2025
NaviMaster: Learning a Unified Policy for GUI and Embodied Navigation Tasks
Zhihao Luo
Wentao Yan abd Jingyu Gong
Min Wang
Zhizhong Zhang
Xuhong Wang
Yuan Xie
Xin Tan
Xin Tan
260
9
0
04 Aug 2025
OS-MAP: How Far Can Computer-Using Agents Go in Breadth and Depth?
Xuetian Chen
Yinghao Chen
Xinfeng Yuan
Zhuo Peng
Lu Chen
...
Tianbao Xie
Zhiyong Wu
Qiushi Sun
Biqing Qi
Bowen Zhou
267
3
0
25 Jul 2025
LPO: Towards Accurate GUI Agent Interaction via Location Preference Optimization
Jiaqi Tang
Yu Xia
Yi-Feng Wu
Yuwei Hu
Yuhui Chen
...
Xiangyu Wu
Hao Lu
Yanqing Ma
Shiyin Lu
Qifeng Chen
403
11
0
11 Jun 2025
Atomic-to-Compositional Generalization for Mobile Agents with A New Benchmark and Scheduling System
Yuan Guo
Tingjia Miao
Zheng Wu
Pengzhou Cheng
Ming Zhou
Zhuosheng Zhang
353
7
0
10 Jun 2025
GUI-Reflection: Empowering Multimodal GUI Models with Self-Reflection Behavior
Penghao Wu
Shengnan Ma
Bo Wang
Jiaheng Yu
Lewei Lu
Ziwei Liu
312
12
0
09 Jun 2025
Look Before You Leap: A GUI-Critic-R1 Model for Pre-Operative Error Diagnosis in GUI Automation
Yuyang Wanyan
Xi Zhang
Haiyang Xu
Haowei Liu
Junyang Wang
...
Ming Yan
Fei Huang
Xiaoshan Yang
Weiming Dong
Changsheng Xu
LLMAG
LRM
450
16
0
05 Jun 2025
GUI-Actor: Coordinate-Free Visual Grounding for GUI Agents
Qianhui Wu
Kanzhi Cheng
Rui Yang
Chaoyun Zhang
Jianwei Yang
...
Huan Zhang
Tong Zhang
Jianbing Zhang
Dongmei Zhang
J. Gao
LM&Ro
362
49
0
03 Jun 2025
VisuRiddles: Fine-grained Perception is a Primary Bottleneck for Multimodal Large Language Models in Abstract Visual Reasoning
Hao Yan
Handong Zheng
Hao Wang
Liang Yin
Xingchen Liu
...
Minghui Liao
Chao Weng
Wei Chen
Yuliang Liu
Xiang Bai
LRM
521
5
0
03 Jun 2025
FormFactory: An Interactive Benchmarking Suite for Multimodal Form-Filling Agents
B. Li
Yuheng Wang
Hao Fei
Juncheng Li
Wei Ji
Yang Deng
Wynne Hsu
293
2
0
02 Jun 2025
AgentCPM-GUI: Building Mobile-Use Agents with Reinforcement Fine-Tuning
Zhong Zhang
Yaxi Lu
Yikun Fu
Yupeng Huo
Shenzhi Yang
...
Chongyi Wang
Chi Chen
Yuan Yao
Zhiyuan Liu
Maosong Sun
LLMAG
ALM
419
32
0
02 Jun 2025
ZeroGUI: Automating Online GUI Learning at Zero Human Cost
Chenyu Yang
Shiqian Su
Shi-Qi Liu
Xuan Dong
Yue Yu
...
Hao Li
Wenhai Wang
Yu Qiao
Xizhou Zhu
Jifeng Dai
OffRL
391
20
0
29 May 2025
XBOUND: Exploring Capability Boundaries of Device-Control Agents at the State Level
Shaoqing Zhang
Kehai Chen
Zhuosheng Zhang
Rumei Li
Rongxiang Weng
Yang Xiang
Liqiang Nie
449
0
0
27 May 2025
MMTBENCH: A Unified Benchmark for Complex Multimodal Table Reasoning
Prasham Yatinkumar Titiya
Jainil Trivedi
Chitta Baral
Vivek Gupta
LMTD
282
8
0
27 May 2025
Think or Not? Selective Reasoning via Reinforcement Learning for Vision-Language Models
Jiaqi Wang
Kevin Qinghong Lin
James Cheng
Mike Zheng Shou
OffRL
ReLM
LRM
681
13
0
22 May 2025
Building a Stable Planner: An Extended Finite State Machine Based Planning Module for Mobile GUI Agent
Fanglin Mo
Junzhe Chen
Haoxuan Zhu
Xuming Hu
LLMAG
347
3
0
20 May 2025
MedBrowseComp: Benchmarking Medical Deep Research and Computer Use
Shan Chen
Pedro Moreira
Yuxin Xiao
Sam Schmidgall
J. Warner
Hugo J. W. L. Aerts
Thomas Hartvigsen
Jack Gallifant
Danielle S. Bitterman
ELM
380
14
0
20 May 2025
Hidden Ghost Hand: Unveiling Backdoor Vulnerabilities in MLLM-Powered Mobile GUI Agents
Pengzhou Cheng
Haowen Hu
Zheng Wu
Zongru Wu
Tianjie Ju
Zhuosheng Zhang
Zhuosheng Zhang
LLMAG
AAML
446
8
0
20 May 2025
Mobile-Agent-V: A Video-Guided Approach for Effortless and Efficient Operational Knowledge Injection in Mobile Automation
Junyang Wang
Haiyang Xu
Xi Zhang
Ming Yan
Ji Zhang
Fei Huang
Jitao Sang
559
0
0
20 May 2025
Scaling Computer-Use Grounding via User Interface Decomposition and Synthesis
Tianbao Xie
Jiaqi Deng
Xiaochuan Li
Junlin Yang
Haoyuan Wu
...
Yiheng Xu
Junli Wang
Doyen Sahoo
Tao Yu
Caiming Xiong
551
76
0
19 May 2025
Toward Generalizable Evaluation in the LLM Era: A Survey Beyond Benchmarks
Yixin Cao
Shibo Hong
Xuzhao Li
Jiahao Ying
Yubo Ma
...
Juanzi Li
Aixin Sun
Qi Zhang
Tat-Seng Chua
Tianwei Zhang
ALM
ELM
592
28
0
26 Apr 2025
Guiding VLM Agents with Process Rewards at Inference Time for GUI Navigation
Zhiyuan Hu
Shiyun Xiong
Yifan Zhang
See-Kiong Ng
Anh Tuan Luu
Jingyi Wang
Shuicheng Yan
Bryan Hooi
403
5
0
22 Apr 2025
ViMo: A Generative Visual GUI World Model for App Agents
Dezhao Luo
Bohan Tang
Kang Li
Georgios Papoudakis
Jifei Song
S. Gong
Haifeng Zhang
Jun Wang
Cheng Deng
LM&Ro
VGen
624
12
0
15 Apr 2025
Breaking the Data Barrier -- Building GUI Agents Through Task Generalization
Junlei Zhang
Zichen Ding
Chang Ma
Zijie Chen
Qiushi Sun
Zhenzhong Lan
Junxian He
1.2K
10
0
14 Apr 2025
Navi-plus: Managing Ambiguous GUI Navigation Tasks with Follow-up Questions
Ziming Cheng
Zhiyuan Huang
Junting Pan
Zhaohui Hou
Mingjie Zhan
460
5
0
31 Mar 2025
A Survey of WebAgents: Towards Next-Generation AI Agents for Web Automation with Large Foundation Models
Liangbo Ning
Ziran Liang
Zhuohang Jiang
Haohao Qu
Yujuan Ding
...
Xiao Wei
Shanru Lin
Hui Liu
Philip S. Yu
Qing Li
LLMAG
LM&Ro
835
78
0
30 Mar 2025
Does Chain-of-Thought Reasoning Help Mobile GUI Agent? An Empirical Study
Li Zhang
Longxi Gao
Mengwei Xu
LRM
253
9
0
21 Mar 2025
UI-Vision: A Desktop-centric GUI Benchmark for Visual Perception and Interaction
Shravan Nayak
Xiangru Jian
Kevin Qinghong Lin
Juan A. Rodriguez
Montek Kalsi
...
David Vazquez
Christopher Pal
Perouz Taslakian
Spandana Gella
Sai Rajeswar
1.4K
40
0
19 Mar 2025
1
2
Next
Page 1 of 2