Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
2411.04890
Cited By
v1
v2 (latest)
GUI Agents with Foundation Models: A Comprehensive Survey
7 November 2024
Shuai Wang
Wen Liu
Jingxuan Chen
Weinan Gan
Xingshan Zeng
Shuai Yu
Xinlong Hao
Youssef Attia El Hili
Yasheng Wang
Ruiming Tang
Bin Wang
Chuhan Wu
Yasheng Wang
Ruiming Tang
Jianye Hao
LLMAG
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"GUI Agents with Foundation Models: A Comprehensive Survey"
27 / 77 papers shown
Mobile-Agent-v2: Mobile Device Operation Assistant with Effective Navigation via Multi-Agent Collaboration
Junyang Wang
Haiyang Xu
Haitao Jia
Xi Zhang
Ming Yan
Weizhou Shen
Ji Zhang
Fei Huang
Jitao Sang
LM&Ro
LLMAG
378
144
0
03 Jun 2024
AndroidWorld: A Dynamic Benchmarking Environment for Autonomous Agents
International Conference on Learning Representations (ICLR), 2024
Christopher Rawles
Sarah Clinckemaillie
Yifan Chang
Jonathan Waltz
Gabrielle Lau
...
Daniel Toyama
Robert Berry
Divya Tyamagundlu
Timothy Lillicrap
Oriana Riva
LLMAG
658
186
0
23 May 2024
Ferret-UI: Grounded Mobile UI Understanding with Multimodal LLMs
Keen You
Haotian Zhang
E. Schoop
Floris Weers
Amanda Swearngin
Jeffrey Nichols
Yinfei Yang
Zhe Gan
MLLM
357
150
0
08 Apr 2024
Android in the Zoo: Chain-of-Action-Thought for GUI Agents
Jiwen Zhang
Jihao Wu
Yihua Teng
Minghui Liao
Nuo Xu
Xiao Xiao
Zhongyu Wei
Duyu Tang
LLMAG
LM&Ro
480
129
0
05 Mar 2024
UFO: A UI-Focused Agent for Windows OS Interaction
Chaoyun Zhang
Liqun Li
Shilin He
Xu Zhang
Bo Qiao
...
Yu Kang
Qingwei Lin
Saravan Rajmohan
Dongmei Zhang
Qi Zhang
LLMAG
554
129
0
08 Feb 2024
Mobile-Agent: Autonomous Multi-Modal Mobile Device Agent with Visual Perception
Junyang Wang
Haiyang Xu
Jiabo Ye
Mingshi Yan
Weizhou Shen
Ji Zhang
Fei Huang
Jitao Sang
331
226
0
29 Jan 2024
GPT-4V(ision) is a Generalist Web Agent, if Grounded
International Conference on Machine Learning (ICML), 2024
Boyuan Zheng
Boyu Gou
Jihyung Kil
Huan Sun
Yu-Chuan Su
MLLM
VLM
LLMAG
407
407
0
03 Jan 2024
GPT-4V in Wonderland: Large Multimodal Models for Zero-Shot Smartphone GUI Navigation
An Yan
Zhengyuan Yang
Wanrong Zhu
Kevin Qinghong Lin
Linjie Li
...
Yiwu Zhong
Julian McAuley
Jianfeng Gao
Zicheng Liu
Lijuan Wang
LLMAG
LM&Ro
402
145
0
13 Nov 2023
Set-of-Mark Prompting Unleashes Extraordinary Visual Grounding in GPT-4V
Jianwei Yang
Hao Zhang
Feng Li
Xueyan Zou
Chun-yue Li
Jianfeng Gao
MLLM
VLM
447
269
0
17 Oct 2023
ILuvUI: Instruction-tuned LangUage-Vision modeling of UIs from Machine Conversations
International Conference on Intelligent User Interfaces (IUI), 2023
Yue Jiang
E. Schoop
Amanda Swearngin
Jeffrey Nichols
MLLM
359
25
0
07 Oct 2023
You Only Look at Screens: Multimodal Chain-of-Action Agents
Annual Meeting of the Association for Computational Linguistics (ACL), 2023
Zhuosheng Zhang
Aston Zhang
LLMAG
LM&Ro
489
170
0
20 Sep 2023
AutoDroid: LLM-powered Task Automation in Android
ACM/IEEE International Conference on Mobile Computing and Networking (MobiCom), 2023
Hao Wen
Yuanchun Li
Guohong Liu
Shanhui Zhao
Tao Yu
Toby Jia-Jun Li
Shiqi Jiang
Yunhao Liu
Yaqin Zhang
Yunxin Liu
431
183
0
29 Aug 2023
A Survey on Large Language Model based Autonomous Agents
Lei Wang
Chengbang Ma
Xueyang Feng
Zeyu Zhang
Hao-ran Yang
...
Xu Chen
Yankai Lin
Wayne Xin Zhao
Zhewei Wei
Ji-Rong Wen
LLMAG
AI4CE
LM&Ro
670
2,158
0
22 Aug 2023
WebArena: A Realistic Web Environment for Building Autonomous Agents
International Conference on Learning Representations (ICLR), 2023
Shuyan Zhou
Frank F. Xu
Hao Zhu
Xuhui Zhou
Robert Lo
...
Tianyue Ou
Yonatan Bisk
Daniel Fried
Uri Alon
Graham Neubig
LLMAG
685
861
0
25 Jul 2023
Android in the Wild: A Large-Scale Dataset for Android Device Control
Neural Information Processing Systems (NeurIPS), 2023
Christopher Rawles
Alice Li
Daniel Rodriguez
Oriana Riva
Timothy Lillicrap
LM&Ro
409
256
0
19 Jul 2023
Multimodal Web Navigation with Instruction-Finetuned Foundation Models
International Conference on Learning Representations (ICLR), 2023
Hiroki Furuta
Kuang-Huei Lee
Ofir Nachum
Yutaka Matsuo
Aleksandra Faust
S. Gu
Izzeddin Gur
LM&Ro
427
143
0
19 May 2023
Language Models can Solve Computer Tasks
Neural Information Processing Systems (NeurIPS), 2023
Geunwoo Kim
Pierre Baldi
Alexander Shmakov
LLMAG
LM&Ro
572
467
0
30 Mar 2023
ReAct: Synergizing Reasoning and Acting in Language Models
International Conference on Learning Representations (ICLR), 2022
Shunyu Yao
Jeffrey Zhao
Dian Yu
Nan Du
Izhak Shafran
Karthik Narasimhan
Yuan Cao
LLMAG
ReLM
LRM
2.6K
5,491
0
06 Oct 2022
Spotlight: Mobile UI Understanding using Vision-Language Models with a Focus
International Conference on Learning Representations (ICLR), 2022
Gang Li
Yang Li
361
82
0
29 Sep 2022
A data-driven approach for learning to control computers
International Conference on Machine Learning (ICML), 2022
Peter C. Humphreys
David Raposo
Tobias Pohlen
Gregory Thornton
Rachita Chhaparia
...
Josh Abramson
Petko Georgiev
Alex Goldin
Adam Santoro
Timothy Lillicrap
334
115
0
16 Feb 2022
Environment Generation for Zero-Shot Compositional Reinforcement Learning
Neural Information Processing Systems (NeurIPS), 2022
Izzeddin Gur
Natasha Jaques
Yingjie Miao
Jongwook Choi
Manoj Kumar Tiwari
Honglak Lee
Aleksandra Faust
264
45
0
21 Jan 2022
VUT: Versatile UI Transformer for Multi-Modal Multi-Task User Interface Modeling
Yang Li
Gang Li
Xin Zhou
Mostafa Dehghani
A. Gritsenko
MLLM
182
40
0
10 Dec 2021
Screen2Words: Automatic Mobile UI Summarization with Multimodal Learning
ACM Symposium on User Interface Software and Technology (UIST), 2021
Bryan Wang
Gang Li
Xin Zhou
Zhourong Chen
Tovi Grossman
Yang Li
850
198
0
07 Aug 2021
UIBert: Learning Generic Multimodal Representations for UI Understanding
International Joint Conference on Artificial Intelligence (IJCAI), 2021
Chongyang Bai
Xiaoxue Zang
Ying Xu
Srinivas Sunkara
Abhinav Rastogi
Jindong Chen
Blaise Agüera y Arcas
274
113
0
29 Jul 2021
Screen Recognition: Creating Accessibility Metadata for Mobile Applications from Pixels
International Conference on Human Factors in Computing Systems (CHI), 2021
Xiaoyi Zhang
Lilian de Greef
Amanda Swearngin
Samuel White
Kyle I. Murray
...
Jeffrey Nichols
Jason Wu
Chris Fleizach
Aaron Everitt
Jeffrey P. Bigham
746
192
0
13 Jan 2021
Learning to Navigate the Web
Izzeddin Gur
U. Rückert
Aleksandra Faust
Dilek Z. Hakkani-Tür
226
71
0
21 Dec 2018
Reinforcement Learning on Web Interfaces Using Workflow-Guided Exploration
Emmy Liu
Kelvin Guu
Panupong Pasupat
Tianlin Shi
Abigail Z. Jacobs
OnRL
211
281
0
24 Feb 2018
Previous
1
2
Page 2 of 2