ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2403.02713
  4. Cited By
Android in the Zoo: Chain-of-Action-Thought for GUI Agents

Android in the Zoo: Chain-of-Action-Thought for GUI Agents

5 March 2024
Jiwen Zhang
Jihao Wu
Yihua Teng
Minghui Liao
Nuo Xu
Xiao Xiao
Zhongyu Wei
Duyu Tang
    LLMAG
    LM&Ro
ArXivPDFHTML

Papers citing "Android in the Zoo: Chain-of-Action-Thought for GUI Agents"

46 / 46 papers shown
Title
Toward Generalizable Evaluation in the LLM Era: A Survey Beyond Benchmarks
Toward Generalizable Evaluation in the LLM Era: A Survey Beyond Benchmarks
Yixin Cao
Shibo Hong
X. Li
Jiahao Ying
Yubo Ma
...
Juanzi Li
Aixin Sun
Xuanjing Huang
Tat-Seng Chua
Yu Jiang
ALM
ELM
84
0
0
26 Apr 2025
Guiding VLM Agents with Process Rewards at Inference Time for GUI Navigation
Guiding VLM Agents with Process Rewards at Inference Time for GUI Navigation
Zhiyuan Hu
Shiyun Xiong
Yifan Zhang
See-Kiong Ng
Anh Tuan Luu
Bo An
Shuicheng Yan
Bryan Hooi
21
0
0
22 Apr 2025
ViMo: A Generative Visual GUI World Model for App Agent
ViMo: A Generative Visual GUI World Model for App Agent
Dezhao Luo
Bohan Tang
Kang Li
Georgios Papoudakis
Jifei Song
S. Gong
Jianye Hao
Jun Wang
Kun Shao
LM&Ro
VGen
39
0
0
15 Apr 2025
Breaking the Data Barrier -- Building GUI Agents Through Task Generalization
Breaking the Data Barrier -- Building GUI Agents Through Task Generalization
Junlei Zhang
Zichen Ding
Chang Ma
Zijie Chen
Qiushi Sun
Zhenzhong Lan
Junxian He
34
0
0
14 Apr 2025
Navi-plus: Managing Ambiguous GUI Navigation Tasks with Follow-up
Navi-plus: Managing Ambiguous GUI Navigation Tasks with Follow-up
Ziming Cheng
Zhiyuan Huang
Junting Pan
Zhaohui Hou
Mingjie Zhan
33
0
0
31 Mar 2025
Does Chain-of-Thought Reasoning Help Mobile GUI Agent? An Empirical Study
Does Chain-of-Thought Reasoning Help Mobile GUI Agent? An Empirical Study
Li Lyna Zhang
Longxi Gao
Mengwei Xu
LRM
37
0
0
21 Mar 2025
UI-Vision: A Desktop-centric GUI Benchmark for Visual Perception and Interaction
UI-Vision: A Desktop-centric GUI Benchmark for Visual Perception and Interaction
Shravan Nayak
Xiangru Jian
Kevin Qinghong Lin
Juan A. Rodriguez
Montek Kalsi
...
David Vazquez
Christopher Pal
Perouz Taslakian
Spandana Gella
Sai Rajeswar
76
0
0
19 Mar 2025
SegAgent: Exploring Pixel Understanding Capabilities in MLLMs by Imitating Human Annotator Trajectories
Muzhi Zhu
Yuzhuo Tian
Hao Chen
Chunluan Zhou
Qingpei Guo
Y. Liu
M. Yang
Chunhua Shen
MLLM
VLM
69
0
0
11 Mar 2025
SmartBench: Is Your LLM Truly a Good Chinese Smartphone Assistant?
Xudong Lu
Haohao Gao
Renshou Wu
Shuai Ren
Xiaoxin Chen
Hongsheng Li
Fangyuan Li
ELM
47
0
0
08 Mar 2025
FedMABench: Benchmarking Mobile Agents on Decentralized Heterogeneous User Data
Wenhao Wang
Zijie Yu
Rui Ye
J. Zhang
S. Chen
Yanfeng Wang
FedML
42
0
0
07 Mar 2025
SpiritSight Agent: Advanced GUI Agent with One Look
SpiritSight Agent: Advanced GUI Agent with One Look
Zhiyuan Huang
Ziming Cheng
Junting Pan
Zhaohui Hou
Mingjie Zhan
LLMAG
80
2
0
05 Mar 2025
CHOP: Mobile Operating Assistant with Constrained High-frequency Optimized Subtask Planning
Yuqi Zhou
Shuai Wang
Sunhao Dai
Qinglin Jia
Zhaocheng Du
Zhenhua Dong
Jun Xu
LM&Ro
63
0
0
05 Mar 2025
Smoothing Grounding and Reasoning for MLLM-Powered GUI Agents with Query-Oriented Pivot Tasks
Zongru Wu
Pengzhou Cheng
Zheng Wu
Tianjie Ju
Zhuosheng Zhang
Gongshen Liu
LRM
32
1
0
01 Mar 2025
VEM: Environment-Free Exploration for Training GUI Agent with Value Environment Model
VEM: Environment-Free Exploration for Training GUI Agent with Value Environment Model
Jiani Zheng
Lu Wang
Fangkai Yang
C. Zhang
Lingrui Mei
Wenjie Yin
Qingwei Lin
Dongmei Zhang
Saravan Rajmohan
Qi Zhang
OffRL
56
2
0
26 Feb 2025
OS-Kairos: Adaptive Interaction for MLLM-Powered GUI Agents
OS-Kairos: Adaptive Interaction for MLLM-Powered GUI Agents
Pengzhou Cheng
Zheng Wu
Zongru Wu
Aston Zhang
Zhuosheng Zhang
Gongshen Liu
LLMAG
48
0
0
26 Feb 2025
MobileSteward: Integrating Multiple App-Oriented Agents with Self-Evolution to Automate Cross-App Instructions
MobileSteward: Integrating Multiple App-Oriented Agents with Self-Evolution to Automate Cross-App Instructions
Yuxuan Liu
Hongda Sun
Wei Liu
Jian Luan
Bo Du
Rui Yan
43
1
0
24 Feb 2025
Explorer: Scaling Exploration-driven Web Trajectory Synthesis for Multimodal Web Agents
Explorer: Scaling Exploration-driven Web Trajectory Synthesis for Multimodal Web Agents
Vardaan Pahuja
Yadong Lu
Corby Rosset
Boyu Gou
Arindam Mitra
Spencer Whitehead
Yu Su
Ahmed Awadallah
LLMAG
LM&Ro
Presented at ResearchTrend Connect | LLMAG on 14 Mar 2025
138
3
1
20 Feb 2025
WebRL: Training LLM Web Agents via Self-Evolving Online Curriculum Reinforcement Learning
WebRL: Training LLM Web Agents via Self-Evolving Online Curriculum Reinforcement Learning
Zehan Qi
Xiao-Chang Liu
Iat Long Iong
Hanyu Lai
X. Sun
...
Shuntian Yao
Tianjie Zhang
Wei Xu
J. Tang
Yuxiao Dong
90
14
0
28 Jan 2025
InfiGUIAgent: A Multimodal Generalist GUI Agent with Native Reasoning and Reflection
InfiGUIAgent: A Multimodal Generalist GUI Agent with Native Reasoning and Reflection
Y. Liu
Pengxiang Li
Zishu Wei
C. Xie
Xueyu Hu
Xinchen Xu
Shengyu Zhang
Xiaotian Han
Hongxia Yang
Fei Wu
LLMAG
LRM
50
11
0
08 Jan 2025
PC Agent: While You Sleep, AI Works -- A Cognitive Journey into Digital
  World
PC Agent: While You Sleep, AI Works -- A Cognitive Journey into Digital World
Yanheng He
Jiahe Jin
Shijie Xia
Jiadi Su
Runze Fan
Haoyang Zou
Xiangkun Hu
Pengfei Liu
LLMAG
28
2
0
23 Dec 2024
Aria-UI: Visual Grounding for GUI Instructions
Aria-UI: Visual Grounding for GUI Instructions
Yuhao Yang
Yue Wang
Dongxu Li
Ziyang Luo
Bei Chen
C. Huang
Junnan Li
LM&Ro
LLMAG
100
14
0
20 Dec 2024
Falcon-UI: Understanding GUI Before Following User Instructions
Falcon-UI: Understanding GUI Before Following User Instructions
Huawen Shen
Chang-Shu Liu
Gengluo Li
Xinlong Wang
Yu Zhou
Can Ma
Xiangyang Ji
LLMAG
74
4
0
12 Dec 2024
GUI Agents with Foundation Models: A Comprehensive Survey
GUI Agents with Foundation Models: A Comprehensive Survey
Shuai Wang
W. Liu
Jingxuan Chen
Weinan Gan
Xingshan Zeng
...
Bin Wang
Chuhan Wu
Yasheng Wang
Ruiming Tang
Jianye Hao
LLMAG
56
12
0
07 Nov 2024
Foundations and Recent Trends in Multimodal Mobile Agents: A Survey
Foundations and Recent Trends in Multimodal Mobile Agents: A Survey
Biao Wu
Yanda Li
Meng Fang
Zirui Song
Zhiwei Zhang
Yunchao Wei
L. Chen
LM&Ro
LLMAG
OffRL
AI4TS
36
3
0
04 Nov 2024
OS-ATLAS: A Foundation Action Model for Generalist GUI Agents
OS-ATLAS: A Foundation Action Model for Generalist GUI Agents
Zhiyong Wu
Zhenyu Wu
Fangzhi Xu
Yian Wang
Qiushi Sun
...
Kanzhi Cheng
Zichen Ding
L. Chen
Paul Pu Liang
Yu Qiao
37
31
0
30 Oct 2024
VideoWebArena: Evaluating Long Context Multimodal Agents with Video Understanding Web Tasks
VideoWebArena: Evaluating Long Context Multimodal Agents with Video Understanding Web Tasks
Lawrence Jang
Yinheng Li
Charles Ding
Justin Lin
Paul Pu Liang
Dan Zhao
Rogerio Bonatti
K. Koishida
21
5
0
24 Oct 2024
HumanEval-V: Evaluating Visual Understanding and Reasoning Abilities of
  Large Multimodal Models Through Coding Tasks
HumanEval-V: Evaluating Visual Understanding and Reasoning Abilities of Large Multimodal Models Through Coding Tasks
Fengji Zhang
Linquan Wu
Huiyu Bai
Guancheng Lin
Xiao Li
Xiao Yu
Yue Wang
Bei Chen
Jacky Keung
MLLM
ELM
LRM
24
0
0
16 Oct 2024
TinyClick: Single-Turn Agent for Empowering GUI Automation
TinyClick: Single-Turn Agent for Empowering GUI Automation
Pawel Pawlowski
Krystian Zawistowski
Wojciech Lapacz
Marcin Skorupa
Adam Wiacek
Sebastien Postansque
Jakub Hoscilowicz
MLLM
LLMAG
LRM
35
6
0
09 Oct 2024
Navigating the Digital World as Humans Do: Universal Visual Grounding for GUI Agents
Navigating the Digital World as Humans Do: Universal Visual Grounding for GUI Agents
Boyu Gou
Ruohan Wang
Boyuan Zheng
Yanan Xie
Cheng Chang
Yiheng Shu
Huan Sun
Yu Su
LM&Ro
LLMAG
55
48
0
07 Oct 2024
Inferring Alt-text For UI Icons With Large Language Models During App
  Development
Inferring Alt-text For UI Icons With Large Language Models During App Development
Sabrina Haque
Christoph Csallner
VLM
21
0
0
26 Sep 2024
MobileVLM: A Vision-Language Model for Better Intra- and Inter-UI
  Understanding
MobileVLM: A Vision-Language Model for Better Intra- and Inter-UI Understanding
Qinzhuo Wu
Weikai Xu
Wei Liu
Tao Tan
Jianfeng Liu
Ang Li
Jian Luan
Bin Wang
Shuo Shang
VLM
19
10
0
23 Sep 2024
TinyAgent: Function Calling at the Edge
TinyAgent: Function Calling at the Edge
Lutfi Eren Erdogan
Nicholas Lee
Siddharth Jha
Sehoon Kim
Ryan Tabrizi
Suhong Moon
Coleman Hooper
Gopala Anumanchipalli
Kurt Keutzer
Amir Gholami
LLMAG
33
10
0
01 Sep 2024
CRAB: Cross-environment Agent Benchmark for Multimodal Language Model
  Agents
CRAB: Cross-environment Agent Benchmark for Multimodal Language Model Agents
Tianqi Xu
Linyao Chen
Dai-Jie Wu
Yanjun Chen
Zecheng Zhang
...
Shilong Liu
Bochen Qian
Philip H. S. Torr
Bernard Ghanem
G. Li
32
14
0
01 Jul 2024
Identifying User Goals from UI Trajectories
Identifying User Goals from UI Trajectories
Omri Berkovitch
Sapir Caduri
Noam Kahlon
Anatoly Efros
Avi Caciularu
Ido Dagan
LLMAG
14
4
0
20 Jun 2024
GUI Action Narrator: Where and When Did That Action Take Place?
GUI Action Narrator: Where and When Did That Action Take Place?
Qinchen Wu
Difei Gao
Kevin Qinghong Lin
Zhuoyu Wu
Xiangwu Guo
Peiran Li
Weichen Zhang
Hengxu Wang
Mike Zheng Shou
29
3
0
19 Jun 2024
VideoGUI: A Benchmark for GUI Automation from Instructional Videos
VideoGUI: A Benchmark for GUI Automation from Instructional Videos
Kevin Qinghong Lin
Linjie Li
Difei Gao
Qinchen Wu
Mingyi Yan
Zhengyuan Yang
Lijuan Wang
Mike Zheng Shou
34
10
0
14 Jun 2024
GUI Odyssey: A Comprehensive Dataset for Cross-App GUI Navigation on
  Mobile Devices
GUI Odyssey: A Comprehensive Dataset for Cross-App GUI Navigation on Mobile Devices
Quanfeng Lu
Wenqi Shao
Zitao Liu
Fanqing Meng
Boxuan Li
Botong Chen
Siyuan Huang
Kaipeng Zhang
Yu Qiao
Ping Luo
30
26
0
12 Jun 2024
AndroidWorld: A Dynamic Benchmarking Environment for Autonomous Agents
AndroidWorld: A Dynamic Benchmarking Environment for Autonomous Agents
Christopher Rawles
Sarah Clinckemaillie
Yifan Chang
Jonathan Waltz
Gabrielle Lau
...
Daniel Toyama
Robert Berry
Divya Tyamagundlu
Timothy Lillicrap
Oriana Riva
LLMAG
48
44
0
23 May 2024
Beyond Chain-of-Thought: A Survey of Chain-of-X Paradigms for LLMs
Beyond Chain-of-Thought: A Survey of Chain-of-X Paradigms for LLMs
Yu Xia
Rui Wang
Xu Liu
Mingyan Li
Tong Yu
Xiang Chen
Julian McAuley
Shuai Li
LRM
35
16
0
24 Apr 2024
Agent Smith: A Single Image Can Jailbreak One Million Multimodal LLM
  Agents Exponentially Fast
Agent Smith: A Single Image Can Jailbreak One Million Multimodal LLM Agents Exponentially Fast
Xiangming Gu
Xiaosen Zheng
Tianyu Pang
Chao Du
Qian Liu
Ye Wang
Jing Jiang
Min-Bin Lin
LLMAG
LM&Ro
35
47
0
13 Feb 2024
UFO: A UI-Focused Agent for Windows OS Interaction
UFO: A UI-Focused Agent for Windows OS Interaction
Chaoyun Zhang
Liqun Li
Shilin He
Xu Zhang
Bo Qiao
...
Yu Kang
Qingwei Lin
Saravan Rajmohan
Dongmei Zhang
Qi Zhang
LLMAG
58
65
0
08 Feb 2024
ScreenAI: A Vision-Language Model for UI and Infographics Understanding
ScreenAI: A Vision-Language Model for UI and Infographics Understanding
Gilles Baechler
Srinivas Sunkara
Maria Wang
Fedir Zubach
Hassan Mansoor
Vincent Etter
Victor Carbune
Jason Lin
Jindong Chen
Abhanshu Sharma
107
47
0
07 Feb 2024
SeeClick: Harnessing GUI Grounding for Advanced Visual GUI Agents
SeeClick: Harnessing GUI Grounding for Advanced Visual GUI Agents
Kanzhi Cheng
Qiushi Sun
Yougang Chu
Fangzhi Xu
Yantao Li
Jianbing Zhang
Zhiyong Wu
LLMAG
162
137
0
17 Jan 2024
CogAgent: A Visual Language Model for GUI Agents
CogAgent: A Visual Language Model for GUI Agents
Wenyi Hong
Weihan Wang
Qingsong Lv
Jiazheng Xu
Wenmeng Yu
...
Juanzi Li
Bin Xu
Yuxiao Dong
Ming Ding
Jie Tang
MLLM
132
310
0
14 Dec 2023
Screen2Words: Automatic Mobile UI Summarization with Multimodal Learning
Screen2Words: Automatic Mobile UI Summarization with Multimodal Learning
Bryan Wang
Gang Li
Xin Zhou
Zhourong Chen
Tovi Grossman
Yang Li
150
152
0
07 Aug 2021
Screen Recognition: Creating Accessibility Metadata for Mobile
  Applications from Pixels
Screen Recognition: Creating Accessibility Metadata for Mobile Applications from Pixels
Xiaoyi Zhang
Lilian de Greef
Amanda Swearngin
Samuel White
Kyle I. Murray
...
Jeffrey Nichols
Jason Wu
Chris Fleizach
Aaron Everitt
Jeffrey P. Bigham
147
163
0
13 Jan 2021
1