ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2311.07562
  4. Cited By
GPT-4V in Wonderland: Large Multimodal Models for Zero-Shot Smartphone
  GUI Navigation

GPT-4V in Wonderland: Large Multimodal Models for Zero-Shot Smartphone GUI Navigation

13 November 2023
An Yan
Zhengyuan Yang
Wanrong Zhu
K. Lin
Linjie Li
Jianfeng Wang
Jianwei Yang
Yiwu Zhong
Julian McAuley
Jianfeng Gao
Zicheng Liu
Lijuan Wang
    LLMAG
    LM&Ro
ArXivPDFHTML

Papers citing "GPT-4V in Wonderland: Large Multimodal Models for Zero-Shot Smartphone GUI Navigation"

50 / 81 papers shown
Title
EcoAgent: An Efficient Edge-Cloud Collaborative Multi-Agent Framework for Mobile Automation
EcoAgent: An Efficient Edge-Cloud Collaborative Multi-Agent Framework for Mobile Automation
Biao Yi
Xavier Hu
Y. Chen
Shengyu Zhang
Hongxia Yang
Fan Wu
Fei Wu
LLMAG
107
0
0
08 May 2025
Task-Oriented Semantic Communication in Large Multimodal Models-based Vehicle Networks
Task-Oriented Semantic Communication in Large Multimodal Models-based Vehicle Networks
Baoxia Du
H. Du
Dusit Niyato
Ruidong Li
53
0
0
05 May 2025
Visual Test-time Scaling for GUI Agent Grounding
Visual Test-time Scaling for GUI Agent Grounding
Tiange Luo
Lajanugen Logeswaran
Justin Johnson
Honglak Lee
51
0
0
01 May 2025
Guiding VLM Agents with Process Rewards at Inference Time for GUI Navigation
Guiding VLM Agents with Process Rewards at Inference Time for GUI Navigation
Zhiyuan Hu
Shiyun Xiong
Yifan Zhang
See-Kiong Ng
Anh Tuan Luu
Bo An
Shuicheng Yan
Bryan Hooi
33
0
0
22 Apr 2025
The Obvious Invisible Threat: LLM-Powered GUI Agents' Vulnerability to Fine-Print Injections
The Obvious Invisible Threat: LLM-Powered GUI Agents' Vulnerability to Fine-Print Injections
C. L. P. Chen
Zhiping Zhang
Bingcan Guo
Shang Ma
Ibrahim Khalilov
...
Yanfang Ye
Ziang Xiao
Yaxing Yao
Tianshi Li
T. Li
AAML
LLMAG
SILM
31
1
0
15 Apr 2025
Socratic Chart: Cooperating Multiple Agents for Robust SVG Chart Understanding
Socratic Chart: Cooperating Multiple Agents for Robust SVG Chart Understanding
Yuyang Ji
Haohan Wang
LRM
31
0
0
14 Apr 2025
MP-GUI: Modality Perception with MLLMs for GUI Understanding
MP-GUI: Modality Perception with MLLMs for GUI Understanding
Ziwei Wang
Weizhi Chen
Leyang Yang
Sheng Zhou
Shengchu Zhao
Hanbei Zhan
Jiongchao Jin
Liangcheng Li
Zirui Shao
Jiajun Bu
60
1
0
18 Mar 2025
3DAxisPrompt: Promoting the 3D Grounding and Reasoning in GPT-4o
3DAxisPrompt: Promoting the 3D Grounding and Reasoning in GPT-4o
Dingning Liu
Cheng Wang
Peng Gao
Renrui Zhang
Xinzhu Ma
Yuan Meng
Zhihui Wang
LRM
44
0
0
17 Mar 2025
In-Context Defense in Computer Agents: An Empirical Study
Pei Yang
Hai Ci
Mike Zheng Shou
AAML
LLMAG
80
0
0
12 Mar 2025
VEM: Environment-Free Exploration for Training GUI Agent with Value Environment Model
VEM: Environment-Free Exploration for Training GUI Agent with Value Environment Model
Jiani Zheng
Lu Wang
Fangkai Yang
C. Zhang
Lingrui Mei
Wenjie Yin
Qingwei Lin
Dongmei Zhang
Saravan Rajmohan
Qi Zhang
OffRL
56
2
0
26 Feb 2025
MobileSteward: Integrating Multiple App-Oriented Agents with Self-Evolution to Automate Cross-App Instructions
MobileSteward: Integrating Multiple App-Oriented Agents with Self-Evolution to Automate Cross-App Instructions
Yuxuan Liu
Hongda Sun
Wei Liu
Jian Luan
Bo Du
Rui Yan
48
2
0
24 Feb 2025
Explorer: Scaling Exploration-driven Web Trajectory Synthesis for Multimodal Web Agents
Explorer: Scaling Exploration-driven Web Trajectory Synthesis for Multimodal Web Agents
Vardaan Pahuja
Yadong Lu
Corby Rosset
Boyu Gou
Arindam Mitra
Spencer Whitehead
Yu Su
Ahmed Awadallah
LLMAG
LM&Ro
Presented at ResearchTrend Connect | LLMAG on 14 Mar 2025
149
3
1
20 Feb 2025
Magma: A Foundation Model for Multimodal AI Agents
Magma: A Foundation Model for Multimodal AI Agents
Jianwei Yang
Reuben Tan
Qianhui Wu
Ruijie Zheng
Baolin Peng
...
Seonghyeon Ye
Joel Jang
Yuquan Deng
Lars Liden
Jianfeng Gao
VLM
AI4TS
107
9
0
18 Feb 2025
Scaling Autonomous Agents via Automatic Reward Modeling And Planning
Scaling Autonomous Agents via Automatic Reward Modeling And Planning
Zhenfang Chen
Delin Chen
Rui Sun
Wenjun Liu
Chuang Gan
LLMAG
58
3
0
17 Feb 2025
Digi-Q: Learning Q-Value Functions for Training Device-Control Agents
Hao Bai
Yifei Zhou
Li Erran Li
Sergey Levine
Aviral Kumar
OffRL
45
1
0
13 Feb 2025
WebRL: Training LLM Web Agents via Self-Evolving Online Curriculum Reinforcement Learning
WebRL: Training LLM Web Agents via Self-Evolving Online Curriculum Reinforcement Learning
Zehan Qi
Xiao-Chang Liu
Iat Long Iong
Hanyu Lai
X. Sun
...
Shuntian Yao
Tianjie Zhang
Wei Xu
J. Tang
Yuxiao Dong
93
14
0
28 Jan 2025
ShowUI: One Vision-Language-Action Model for GUI Visual Agent
ShowUI: One Vision-Language-Action Model for GUI Visual Agent
Kevin Qinghong Lin
Linjie Li
Difei Gao
Z. Yang
Shiwei Wu
Zechen Bai
Weixian Lei
Lijuan Wang
Mike Zheng Shou
LLMAG
72
13
0
26 Nov 2024
Generalist Virtual Agents: A Survey on Autonomous Agents Across Digital Platforms
Minghe Gao
Wendong Bu
Bingchen Miao
Yang Wu
Yunfei Li
Juncheng Billy Li
Siliang Tang
Qi Wu
Yueting Zhuang
Meng Wang
LM&Ro
33
3
0
17 Nov 2024
GUI Agents with Foundation Models: A Comprehensive Survey
GUI Agents with Foundation Models: A Comprehensive Survey
Shuai Wang
W. Liu
Jingxuan Chen
Weinan Gan
Xingshan Zeng
...
Bin Wang
Chuhan Wu
Yasheng Wang
Ruiming Tang
Jianye Hao
LLMAG
68
12
0
07 Nov 2024
Foundations and Recent Trends in Multimodal Mobile Agents: A Survey
Foundations and Recent Trends in Multimodal Mobile Agents: A Survey
Biao Wu
Yanda Li
Meng Fang
Zirui Song
Zhiwei Zhang
Yunchao Wei
L. Chen
LM&Ro
LLMAG
OffRL
AI4TS
39
4
0
04 Nov 2024
AndroidLab: Training and Systematic Benchmarking of Android Autonomous
  Agents
AndroidLab: Training and Systematic Benchmarking of Android Autonomous Agents
Yifan Xu
Xiao Liu
X. Sun
Siyi Cheng
Hao Yu
Hanyu Lai
Shudan Zhang
Dan Zhang
Jie Tang
Yuxiao Dong
LLMAG
44
7
0
31 Oct 2024
EDGE: Enhanced Grounded GUI Understanding with Enriched
  Multi-Granularity Synthetic Data
EDGE: Enhanced Grounded GUI Understanding with Enriched Multi-Granularity Synthetic Data
Xuetian Chen
Hangcheng Li
Jiaqing Liang
Sihang Jiang
Deqing Yang
LLMAG
46
2
0
25 Oct 2024
Navigating the Digital World as Humans Do: Universal Visual Grounding for GUI Agents
Navigating the Digital World as Humans Do: Universal Visual Grounding for GUI Agents
Boyu Gou
Ruohan Wang
Boyuan Zheng
Yanan Xie
Cheng Chang
Yiheng Shu
Huan Sun
Yu Su
LM&Ro
LLMAG
76
48
0
07 Oct 2024
SWE-bench Multimodal: Do AI Systems Generalize to Visual Software
  Domains?
SWE-bench Multimodal: Do AI Systems Generalize to Visual Software Domains?
John Yang
Carlos E. Jimenez
Alex Zhang
K. Lieret
Joyce Yang
...
Gabriel Synnaeve
Karthik Narasimhan
Diyi Yang
Sida I. Wang
Ofir Press
31
22
0
04 Oct 2024
Dynamic Planning for LLM-based Graphical User Interface Automation
Dynamic Planning for LLM-based Graphical User Interface Automation
Shaoqing Zhang
Zhuosheng Zhang
Kehai Chen
Xinbei Ma
Muyun Yang
Tiejun Zhao
Min Zhang
LLMAG
29
7
0
01 Oct 2024
Turn Every Application into an Agent: Towards Efficient
  Human-Agent-Computer Interaction with API-First LLM-Based Agents
Turn Every Application into an Agent: Towards Efficient Human-Agent-Computer Interaction with API-First LLM-Based Agents
Junting Lu
Zhiyang Zhang
Fangkai Yang
Jue Zhang
Lu Wang
Chao Du
Qingwei Lin
Saravan Rajmohan
Dongmei Zhang
Qi Zhang
LLMAG
28
1
0
25 Sep 2024
MobileVLM: A Vision-Language Model for Better Intra- and Inter-UI
  Understanding
MobileVLM: A Vision-Language Model for Better Intra- and Inter-UI Understanding
Qinzhuo Wu
Weikai Xu
Wei Liu
Tao Tan
Jianfeng Liu
Ang Li
Jian Luan
Bin Wang
Shuo Shang
VLM
32
10
0
23 Sep 2024
MobileViews: A Large-Scale Mobile GUI Dataset
MobileViews: A Large-Scale Mobile GUI Dataset
Longxi Gao
Li Zhang
Shihe Wang
Shangguang Wang
Yuanchun Li
Mengwei Xu
28
5
0
22 Sep 2024
NaviQAte: Functionality-Guided Web Application Navigation
NaviQAte: Functionality-Guided Web Application Navigation
M. Shahbandeh
Parsa Alian
Noor Nashid
Ali Mesbah
23
2
0
16 Sep 2024
Windows Agent Arena: Evaluating Multi-Modal OS Agents at Scale
Windows Agent Arena: Evaluating Multi-Modal OS Agents at Scale
Rogerio Bonatti
Dan Zhao
Francesco Bonacci
Dillon Dupont
Sara Abdali
...
Justin Wagle
K. Koishida
A. Bucker
Lawrence Jang
Zack Hui
LLMAG
43
26
0
12 Sep 2024
Caution for the Environment: Multimodal Agents are Susceptible to
  Environmental Distractions
Caution for the Environment: Multimodal Agents are Susceptible to Environmental Distractions
Xinbei Ma
Yiting Wang
Yao Yao
Tongxin Yuan
Aston Zhang
Zhuosheng Zhang
Hai Zhao
AAML
LLMAG
27
16
0
05 Aug 2024
MM-Vet v2: A Challenging Benchmark to Evaluate Large Multimodal Models
  for Integrated Capabilities
MM-Vet v2: A Challenging Benchmark to Evaluate Large Multimodal Models for Integrated Capabilities
Weihao Yu
Zhengyuan Yang
Linfeng Ren
Linjie Li
Jianfeng Wang
K. Lin
Chung-Ching Lin
Zicheng Liu
Lijuan Wang
Xinchao Wang
VLM
MLLM
36
17
0
01 Aug 2024
OmniParser for Pure Vision Based GUI Agent
OmniParser for Pure Vision Based GUI Agent
Yadong Lu
Jianwei Yang
Yelong Shen
Ahmed Hassan Awadallah
MLLM
27
33
0
01 Aug 2024
Systematic Categorization, Construction and Evaluation of New Attacks against Multi-modal Mobile GUI Agents
Systematic Categorization, Construction and Evaluation of New Attacks against Multi-modal Mobile GUI Agents
Yulong Yang
Xinshan Yang
Shuaidong Li
Chenhao Lin
Zhengyu Zhao
Chao Shen
Tianwei Zhang
40
1
0
12 Jul 2024
Flowy: Supporting UX Design Decisions Through AI-Driven Pattern
  Annotation in Multi-Screen User Flows
Flowy: Supporting UX Design Decisions Through AI-Driven Pattern Annotation in Multi-Screen User Flows
Yuwen Lu
Ziang Tong
Qinyi Zhao
Yewon Oh
Bryan Wang
Toby Jia-Jun Li
44
6
0
23 Jun 2024
Crepe: A Mobile Screen Data Collector Using Graph Query
Crepe: A Mobile Screen Data Collector Using Graph Query
Yuwen Lu
Meng Chen
Qi Zhao
Victor Cox
Yang Yang
Meng Jiang
Jay Brockman
Tamara Kay
Toby Jia-Jun Li
26
1
0
23 Jun 2024
E-ANT: A Large-Scale Dataset for Efficient Automatic GUI NavigaTion
E-ANT: A Large-Scale Dataset for Efficient Automatic GUI NavigaTion
Ke Wang
Tianyu Xia
Zhangxuan Gu
Yi Zhao
Shuheng Shen
Changhua Meng
Weiqiang Wang
Ke Xu
31
0
0
20 Jun 2024
GUI Action Narrator: Where and When Did That Action Take Place?
GUI Action Narrator: Where and When Did That Action Take Place?
Qinchen Wu
Difei Gao
Kevin Qinghong Lin
Zhuoyu Wu
Xiangwu Guo
Peiran Li
Weichen Zhang
Hengxu Wang
Mike Zheng Shou
34
3
0
19 Jun 2024
Do Multimodal Foundation Models Understand Enterprise Workflows? A
  Benchmark for Business Process Management Tasks
Do Multimodal Foundation Models Understand Enterprise Workflows? A Benchmark for Business Process Management Tasks
Michael Wornow
A. Narayan
Ben T Viggiano
Ishan S. Khare
Tathagat Verma
...
Joshua Martinez
Vardhan Agrawal
Althea Hudson
N. Shah
Christopher Ré
35
4
0
19 Jun 2024
GUICourse: From General Vision Language Models to Versatile GUI Agents
GUICourse: From General Vision Language Models to Versatile GUI Agents
Wentong Chen
Junbo Cui
Jinyi Hu
Yujia Qin
Junjie Fang
...
Yupeng Huo
Yuan Yao
Yankai Lin
Zhiyuan Liu
Maosong Sun
LLMAG
31
30
0
17 Jun 2024
VideoGUI: A Benchmark for GUI Automation from Instructional Videos
VideoGUI: A Benchmark for GUI Automation from Instructional Videos
Kevin Qinghong Lin
Linjie Li
Difei Gao
Qinchen Wu
Mingyi Yan
Zhengyuan Yang
Lijuan Wang
Mike Zheng Shou
39
10
0
14 Jun 2024
Visual Sketchpad: Sketching as a Visual Chain of Thought for Multimodal
  Language Models
Visual Sketchpad: Sketching as a Visual Chain of Thought for Multimodal Language Models
Yushi Hu
Weijia Shi
Xingyu Fu
Dan Roth
Mari Ostendorf
Luke Zettlemoyer
Noah A. Smith
Ranjay Krishna
LRM
37
36
0
13 Jun 2024
GUI Odyssey: A Comprehensive Dataset for Cross-App GUI Navigation on
  Mobile Devices
GUI Odyssey: A Comprehensive Dataset for Cross-App GUI Navigation on Mobile Devices
Quanfeng Lu
Wenqi Shao
Zitao Liu
Fanqing Meng
Boxuan Li
Botong Chen
Siyuan Huang
Kaipeng Zhang
Yu Qiao
Ping Luo
43
26
0
12 Jun 2024
Mobile-Agent-v2: Mobile Device Operation Assistant with Effective
  Navigation via Multi-Agent Collaboration
Mobile-Agent-v2: Mobile Device Operation Assistant with Effective Navigation via Multi-Agent Collaboration
Junyang Wang
Haiyang Xu
Haitao Jia
Xi Zhang
Ming Yan
Weizhou Shen
Ji Zhang
Fei Huang
Jitao Sang
LM&Ro
LLMAG
29
45
0
03 Jun 2024
AndroidWorld: A Dynamic Benchmarking Environment for Autonomous Agents
AndroidWorld: A Dynamic Benchmarking Environment for Autonomous Agents
Christopher Rawles
Sarah Clinckemaillie
Yifan Chang
Jonathan Waltz
Gabrielle Lau
...
Daniel Toyama
Robert Berry
Divya Tyamagundlu
Timothy Lillicrap
Oriana Riva
LLMAG
62
44
0
23 May 2024
Human-Centered LLM-Agent User Interface: A Position Paper
Human-Centered LLM-Agent User Interface: A Position Paper
Daniel Y. Chin
Yuxuan Wang
Gus Xia
LLMAG
27
1
0
19 May 2024
Latent State Estimation Helps UI Agents to Reason
Latent State Estimation Helps UI Agents to Reason
Will Bishop
Alice Li
Christopher Rawles
Oriana Riva
LRM
LLMAG
19
3
0
17 May 2024
Automating the Enterprise with Foundation Models
Automating the Enterprise with Foundation Models
Michael Wornow
A. Narayan
Krista Opsahl-Ong
Quinn McIntyre
Nigam H. Shah
Christopher Ré
AI4CE
31
9
0
03 May 2024
MMAC-Copilot: Multi-modal Agent Collaboration Operating Copilot
MMAC-Copilot: Multi-modal Agent Collaboration Operating Copilot
Zirui Song
Yaohang Li
Meng Fang
Zhenhao Chen
Zecheng Shi
Yuan Huang
Ling-Hao Chen
Xiuying Chen
Ling Chen
LLMAG
34
1
0
28 Apr 2024
Benchmarking Mobile Device Control Agents across Diverse Configurations
Benchmarking Mobile Device Control Agents across Diverse Configurations
Juyong Lee
Taywon Min
Minyong An
Changyeon Kim
Kimin Lee
31
8
0
25 Apr 2024
12
Next