Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales

Terms and Conditions

Twitter GitHub LinkedIn Bluesky Youtube

© 2026 ResearchTrend.AI, All rights reserved.

Home
Papers
2403.02713
Cited By

Android in the Zoo: Chain-of-Action-Thought for GUI Agents

v1v2 (latest)

Android in the Zoo: Chain-of-Action-Thought for GUI Agents

5 March 2024

ArXiv (abs)PDF HTML Github (101★)

Papers citing "Android in the Zoo: Chain-of-Action-Thought for GUI Agents"

50 / 85 papers shown

Reinforcement Learning for Large Model: A Survey

Reinforcement Learning for Large Model: A Survey

Mike Zheng Shou

420

2

0

24 Dec 2025

Training High-Level Schedulers with Execution-Feedback Reinforcement Learning for Long-Horizon GUI Automation

Training High-Level Schedulers with Execution-Feedback Reinforcement Learning for Long-Horizon GUI Automation

Zhuosheng Zhang

120

0

0

27 Nov 2025

A Variance-Based Analysis of Sample Complexity for Grid Coverage

A Variance-Based Analysis of Sample Complexity for Grid Coverage

244

5

0

21 Nov 2025

AUTO-Explorer: Automated Data Collection for GUI Agent

AUTO-Explorer: Automated Data Collection for GUI Agent

Mike Zheng Shou

213

2

0

09 Nov 2025

SCoPE VLM: Selective Context Processing for Efficient Document Navigation in Vision-Language Models

SCoPE VLM: Selective Context Processing for Efficient Document Navigation in Vision-Language Models

Vijay Krishna Madisetti

162

0

0

22 Oct 2025

ColorAgent: Building A Robust, Personalized, and Interactive OS Agent

ColorAgent: Building A Robust, Personalized, and Interactive OS Agent

...

Zhuosheng Zhang

230

3

0

22 Oct 2025

MetaCaptioner: Towards Generalist Visual Captioning with Open-source Suites

MetaCaptioner: Towards Generalist Visual Captioning with Open-source Suites

...

316

0

0

14 Oct 2025

Agent-ScanKit: Unraveling Memory and Reasoning of Multimodal Agents via Sensitivity Perturbations

Agent-ScanKit: Unraveling Memory and Reasoning of Multimodal Agents via Sensitivity Perturbations

Zhuosheng Zhang

Zhuosheng Zhang

446

2

0

01 Oct 2025

GUI-Shepherd: Reliable Process Reward and Verification for Long-Sequence GUI Tasks

GUI-Shepherd: Reliable Process Reward and Verification for Long-Sequence GUI Tasks

...

197

2

0

28 Sep 2025

Robust, Observable, and Evolvable Agentic Systems Engineering: A Principled Framework Validated via the Fairy GUI Agent

Robust, Observable, and Evolvable Agentic Systems Engineering: A Principled Framework Validated via the Fairy GUI Agent

188

0

0

25 Sep 2025

UIPro: Unleashing Superior Interaction Capability For GUI Agents

UIPro: Unleashing Superior Interaction Capability For GUI Agents

Zhaoxiang Zhang

342

0

0

22 Sep 2025

GUI-ReWalk: Massive Data Generation for GUI Agent via Stochastic Exploration and Intent-Aware Reasoning

GUI-ReWalk: Massive Data Generation for GUI Agent via Stochastic Exploration and Intent-Aware Reasoning

231

3

0

19 Sep 2025

See, Think, Act: Teaching Multimodal Agents to Effectively Interact with GUI by Identifying Toggles

See, Think, Act: Teaching Multimodal Agents to Effectively Interact with GUI by Identifying Toggles

Zhuosheng Zhang

193

1

0

17 Sep 2025

OmniActor: A Generalist GUI and Embodied Agent for 2D&3D Worlds

OmniActor: A Generalist GUI and Embodied Agent for 2D&3D Worlds

217

4

0

02 Sep 2025

UItron: Foundational GUI Agent with Advanced Perception and Planning

UItron: Foundational GUI Agent with Advanced Perception and Planning

242

14

0

29 Aug 2025

SWIRL: A Staged Workflow for Interleaved Reinforcement Learning in Mobile GUI Control

SWIRL: A Staged Workflow for Interleaved Reinforcement Learning in Mobile GUI Control

297

0

0

27 Aug 2025

Structuring GUI Elements through Vision Language Models: Towards Action Space Generation

Structuring GUI Elements through Vision Language Models: Towards Action Space Generation

215

0

0

22 Aug 2025

UI-Venus Technical Report: Building High-performance UI Agents with RFT

UI-Venus Technical Report: Building High-performance UI Agents with RFT

...

499

39

0

14 Aug 2025

MVISU-Bench: Benchmarking Mobile Agents for Real-World Tasks by Multi-App, Vague, Interactive, Single-App and Unethical Instructions

MVISU-Bench: Benchmarking Mobile Agents for Real-World Tasks by Multi-App, Vague, Interactive, Single-App and Unethical Instructions

241

4

0

12 Aug 2025

OpenCUA: Open Foundations for Computer-Use Agents

OpenCUA: Open Foundations for Computer-Use Agents

...

351

55

0

12 Aug 2025

Evolving in Tasks: Empowering the Multi-modality Large Language Model as the Computer Use Agent

Evolving in Tasks: Empowering the Multi-modality Large Language Model as the Computer Use Agent

330

6

0

06 Aug 2025

SEAgent: Self-Evolving Computer Use Agent with Autonomous Learning from Experience

SEAgent: Self-Evolving Computer Use Agent with Autonomous Learning from Experience

296

29

0

06 Aug 2025

OS Agents: A Survey on MLLM-based Agents for General Computing Devices Use

OS Agents: A Survey on MLLM-based Agents for General Computing Devices Use

...

LLMAG LM&Ro AI4TS

382

43

0

06 Aug 2025

NaviMaster: Learning a Unified Policy for GUI and Embodied Navigation Tasks

NaviMaster: Learning a Unified Policy for GUI and Embodied Navigation Tasks

Wentao Yan abd Jingyu Gong

260

9

0

04 Aug 2025

OS-MAP: How Far Can Computer-Using Agents Go in Breadth and Depth?

OS-MAP: How Far Can Computer-Using Agents Go in Breadth and Depth?

...

267

3

0

25 Jul 2025

LPO: Towards Accurate GUI Agent Interaction via Location Preference Optimization

LPO: Towards Accurate GUI Agent Interaction via Location Preference Optimization

...

403

11

0

11 Jun 2025

Atomic-to-Compositional Generalization for Mobile Agents with A New Benchmark and Scheduling System

Zhuosheng Zhang

353

7

0

10 Jun 2025

GUI-Reflection: Empowering Multimodal GUI Models with Self-Reflection Behavior

GUI-Reflection: Empowering Multimodal GUI Models with Self-Reflection Behavior

312

12

0

09 Jun 2025

Look Before You Leap: A GUI-Critic-R1 Model for Pre-Operative Error Diagnosis in GUI Automation

Look Before You Leap: A GUI-Critic-R1 Model for Pre-Operative Error Diagnosis in GUI Automation

...

450

16

0

05 Jun 2025

GUI-Actor: Coordinate-Free Visual Grounding for GUI Agents

GUI-Actor: Coordinate-Free Visual Grounding for GUI Agents

...

362

49

0

03 Jun 2025

VisuRiddles: Fine-grained Perception is a Primary Bottleneck for Multimodal Large Language Models in Abstract Visual Reasoning

VisuRiddles: Fine-grained Perception is a Primary Bottleneck for Multimodal Large Language Models in Abstract Visual Reasoning

...

521

5

0

03 Jun 2025

FormFactory: An Interactive Benchmarking Suite for Multimodal Form-Filling Agents

FormFactory: An Interactive Benchmarking Suite for Multimodal Form-Filling Agents

293

2

0

02 Jun 2025

AgentCPM-GUI: Building Mobile-Use Agents with Reinforcement Fine-Tuning

AgentCPM-GUI: Building Mobile-Use Agents with Reinforcement Fine-Tuning

...

419

32

0

02 Jun 2025

ZeroGUI: Automating Online GUI Learning at Zero Human Cost

ZeroGUI: Automating Online GUI Learning at Zero Human Cost

...

391

20

0

29 May 2025

XBOUND: Exploring Capability Boundaries of Device-Control Agents at the State Level

XBOUND: Exploring Capability Boundaries of Device-Control Agents at the State Level

Zhuosheng Zhang

449

0

0

27 May 2025

MMTBENCH: A Unified Benchmark for Complex Multimodal Table Reasoning

MMTBENCH: A Unified Benchmark for Complex Multimodal Table Reasoning

Prasham Yatinkumar Titiya

282

8

0

27 May 2025

Think or Not? Selective Reasoning via Reinforcement Learning for Vision-Language Models

Think or Not? Selective Reasoning via Reinforcement Learning for Vision-Language Models

Kevin Qinghong Lin

Mike Zheng Shou

681

13

0

22 May 2025

Building a Stable Planner: An Extended Finite State Machine Based Planning Module for Mobile GUI Agent

Building a Stable Planner: An Extended Finite State Machine Based Planning Module for Mobile GUI Agent

347

3

0

20 May 2025

MedBrowseComp: Benchmarking Medical Deep Research and Computer Use

MedBrowseComp: Benchmarking Medical Deep Research and Computer Use

Hugo J. W. L. Aerts

Thomas Hartvigsen

Danielle S. Bitterman

380

14

0

20 May 2025

Hidden Ghost Hand: Unveiling Backdoor Vulnerabilities in MLLM-Powered Mobile GUI Agents

Hidden Ghost Hand: Unveiling Backdoor Vulnerabilities in MLLM-Powered Mobile GUI Agents

Zhuosheng Zhang

Zhuosheng Zhang

446

8

0

20 May 2025

Mobile-Agent-V: A Video-Guided Approach for Effortless and Efficient Operational Knowledge Injection in Mobile Automation

Mobile-Agent-V: A Video-Guided Approach for Effortless and Efficient Operational Knowledge Injection in Mobile Automation

559

0

0

20 May 2025

Scaling Computer-Use Grounding via User Interface Decomposition and Synthesis

Scaling Computer-Use Grounding via User Interface Decomposition and Synthesis

...

551

76

0

19 May 2025

Toward Generalizable Evaluation in the LLM Era: A Survey Beyond Benchmarks

Toward Generalizable Evaluation in the LLM Era: A Survey Beyond Benchmarks

...

592

28

0

26 Apr 2025

Guiding VLM Agents with Process Rewards at Inference Time for GUI Navigation

Guiding VLM Agents with Process Rewards at Inference Time for GUI Navigation

403

5

0

22 Apr 2025

ViMo: A Generative Visual GUI World Model for App Agents

ViMo: A Generative Visual GUI World Model for App Agents

Georgios Papoudakis

624

12

0

15 Apr 2025

Breaking the Data Barrier -- Building GUI Agents Through Task Generalization

Breaking the Data Barrier -- Building GUI Agents Through Task Generalization

1.2K

10

0

14 Apr 2025

Navi-plus: Managing Ambiguous GUI Navigation Tasks with Follow-up Questions

Navi-plus: Managing Ambiguous GUI Navigation Tasks with Follow-up Questions

460

5

0

31 Mar 2025

A Survey of WebAgents: Towards Next-Generation AI Agents for Web Automation with Large Foundation Models

A Survey of WebAgents: Towards Next-Generation AI Agents for Web Automation with Large Foundation Models

...

835

78

0

30 Mar 2025

Does Chain-of-Thought Reasoning Help Mobile GUI Agent? An Empirical Study

Does Chain-of-Thought Reasoning Help Mobile GUI Agent? An Empirical Study

253

9

0

21 Mar 2025

UI-Vision: A Desktop-centric GUI Benchmark for Visual Perception and Interaction

UI-Vision: A Desktop-centric GUI Benchmark for Visual Perception and Interaction

Kevin Qinghong Lin

Juan A. Rodriguez

...

Christopher Pal

Perouz Taslakian

1.4K

40

0

19 Mar 2025

Page 1 of 2