ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2311.07562
  4. Cited By
GPT-4V in Wonderland: Large Multimodal Models for Zero-Shot Smartphone
  GUI Navigation

GPT-4V in Wonderland: Large Multimodal Models for Zero-Shot Smartphone GUI Navigation

13 November 2023
An Yan
Zhengyuan Yang
Wanrong Zhu
K. Lin
Linjie Li
Jianfeng Wang
Jianwei Yang
Yiwu Zhong
Julian McAuley
Jianfeng Gao
Zicheng Liu
Lijuan Wang
    LLMAG
    LM&Ro
ArXivPDFHTML

Papers citing "GPT-4V in Wonderland: Large Multimodal Models for Zero-Shot Smartphone GUI Navigation"

31 / 81 papers shown
Title
List Items One by One: A New Data Source and Learning Paradigm for Multimodal LLMs
List Items One by One: A New Data Source and Learning Paradigm for Multimodal LLMs
An Yan
Zhengyuan Yang
Junda Wu
Wanrong Zhu
Jianwei Yang
...
K. Lin
Jianfeng Wang
Julian McAuley
Jianfeng Gao
Lijuan Wang
LRM
34
12
0
25 Apr 2024
Enhancing Mobile "How-to" Queries with Automated Search Results
  Verification and Reranking
Enhancing Mobile "How-to" Queries with Automated Search Results Verification and Reranking
Lei Ding
Jeshwanth Bheemanpally
Yi Zhang
27
1
0
13 Apr 2024
Training a Vision Language Model as Smartphone Assistant
Training a Vision Language Model as Smartphone Assistant
Nicolai Dorka
Janusz Marecki
Ammar Anwar
16
3
0
12 Apr 2024
LlamaTouch: A Faithful and Scalable Testbed for Mobile UI Automation
  Task Evaluation
LlamaTouch: A Faithful and Scalable Testbed for Mobile UI Automation Task Evaluation
Li Lyna Zhang
Shihe Wang
Xianqing Jia
Zhihan Zheng
Yun-Yu Yan
Longxi Gao
Yuanchun Li
Mengwei Xu
LLMAG
25
10
0
12 Apr 2024
Autonomous Evaluation and Refinement of Digital Agents
Autonomous Evaluation and Refinement of Digital Agents
Jiayi Pan
Yichi Zhang
Nicholas Tomlin
Yifei Zhou
Sergey Levine
Alane Suhr
ELM
36
49
0
09 Apr 2024
Ferret-UI: Grounded Mobile UI Understanding with Multimodal LLMs
Ferret-UI: Grounded Mobile UI Understanding with Multimodal LLMs
Keen You
Haotian Zhang
E. Schoop
Floris Weers
Amanda Swearngin
Jeffrey Nichols
Yinfei Yang
Zhe Gan
MLLM
39
82
0
08 Apr 2024
VisionGPT: LLM-Assisted Real-Time Anomaly Detection for Safe Visual
  Navigation
VisionGPT: LLM-Assisted Real-Time Anomaly Detection for Safe Visual Navigation
Hao Wang
Jiayou Qin
Ashish Bastola
Xiwen Chen
John Suchanek
Zihao Gong
Abolfazl Razi
35
15
0
19 Mar 2024
Android in the Zoo: Chain-of-Action-Thought for GUI Agents
Android in the Zoo: Chain-of-Action-Thought for GUI Agents
Jiwen Zhang
Jihao Wu
Yihua Teng
Minghui Liao
Nuo Xu
Xiao Xiao
Zhongyu Wei
Duyu Tang
LLMAG
LM&Ro
32
50
0
05 Mar 2024
Design2Code: Benchmarking Multimodal Code Generation for Automated Front-End Engineering
Design2Code: Benchmarking Multimodal Code Generation for Automated Front-End Engineering
Chenglei Si
Yanzhe Zhang
Zhengyuan Yang
Zhengyuan Yang
Ruibo Liu
Diyi Yang
14
1
0
05 Mar 2024
Large Multimodal Agents: A Survey
Large Multimodal Agents: A Survey
Junlin Xie
Zhihong Chen
Ruifei Zhang
Xiang Wan
Guanbin Li
LM&Ro
LLMAG
37
38
0
23 Feb 2024
GenSERP: Large Language Models for Whole Page Presentation
Zhenning Zhang
Yunan Zhang
Suyu Ge
Guangwei Weng
M. Narang
Xia Song
Saurabh Tiwary
KELM
77
2
0
22 Feb 2024
CoCo-Agent: A Comprehensive Cognitive MLLM Agent for Smartphone GUI
  Automation
CoCo-Agent: A Comprehensive Cognitive MLLM Agent for Smartphone GUI Automation
Xinbei Ma
Zhuosheng Zhang
Hai Zhao
LLMAG
33
21
0
19 Feb 2024
PIVOT: Iterative Visual Prompting Elicits Actionable Knowledge for VLMs
PIVOT: Iterative Visual Prompting Elicits Actionable Knowledge for VLMs
Soroush Nasiriany
Fei Xia
Wenhao Yu
Ted Xiao
Jacky Liang
...
Karol Hausman
N. Heess
Chelsea Finn
Sergey Levine
Brian Ichter
LM&Ro
LRM
25
90
0
12 Feb 2024
Understanding the Weakness of Large Language Model Agents within a
  Complex Android Environment
Understanding the Weakness of Large Language Model Agents within a Complex Android Environment
Mingzhe Xing
Rongkai Zhang
Hui Xue
Qi Chen
Fan Yang
Zhengjin Xiao
LLMAG
ELM
AAML
26
23
0
09 Feb 2024
AI Assistance for UX: A Literature Review Through Human-Centered AI
AI Assistance for UX: A Literature Review Through Human-Centered AI
Yuwen Lu
Yuewen Yang
Qinyi Zhao
Chengzhi Zhang
Toby Jia-Jun Li
9
16
0
08 Feb 2024
UFO: A UI-Focused Agent for Windows OS Interaction
UFO: A UI-Focused Agent for Windows OS Interaction
Chaoyun Zhang
Liqun Li
Shilin He
Xu Zhang
Bo Qiao
...
Yu Kang
Qingwei Lin
Saravan Rajmohan
Dongmei Zhang
Qi Zhang
LLMAG
58
66
0
08 Feb 2024
WebVoyager: Building an End-to-End Web Agent with Large Multimodal
  Models
WebVoyager: Building an End-to-End Web Agent with Large Multimodal Models
Hongliang He
Wenlin Yao
Kaixin Ma
Wenhao Yu
Yong Dai
Hongming Zhang
Zhenzhong Lan
Dong Yu
LLMAG
30
121
0
25 Jan 2024
VisualWebArena: Evaluating Multimodal Agents on Realistic Visual Web
  Tasks
VisualWebArena: Evaluating Multimodal Agents on Realistic Visual Web Tasks
Jing Yu Koh
Robert Lo
Lawrence Jang
Vikram Duvvur
Ming Chong Lim
Po-Yu Huang
Graham Neubig
Shuyan Zhou
Ruslan Salakhutdinov
Daniel Fried
23
0
0
24 Jan 2024
SeeClick: Harnessing GUI Grounding for Advanced Visual GUI Agents
SeeClick: Harnessing GUI Grounding for Advanced Visual GUI Agents
Kanzhi Cheng
Qiushi Sun
Yougang Chu
Fangzhi Xu
Yantao Li
Jianbing Zhang
Zhiyong Wu
LLMAG
170
138
0
17 Jan 2024
MobileAgent: enhancing mobile control via human-machine interaction and
  SOP integration
MobileAgent: enhancing mobile control via human-machine interaction and SOP integration
Tinghe Ding
LLMAG
LM&Ro
34
6
0
04 Jan 2024
GPT-4V(ision) is a Generalist Web Agent, if Grounded
GPT-4V(ision) is a Generalist Web Agent, if Grounded
Boyuan Zheng
Boyu Gou
Jihyung Kil
Huan Sun
Yu-Chuan Su
MLLM
VLM
LLMAG
41
205
0
03 Jan 2024
AppAgent: Multimodal Agents as Smartphone Users
AppAgent: Multimodal Agents as Smartphone Users
C. Zhang
Zhao Yang
Jiaxuan Liu
Yucheng Han
Xin Chen
Zebiao Huang
Bin-Bin Fu
Gang Yu
LM&Ro
LLMAG
19
77
0
21 Dec 2023
ASSISTGUI: Task-Oriented Desktop Graphical User Interface Automation
ASSISTGUI: Task-Oriented Desktop Graphical User Interface Automation
Difei Gao
Lei Ji
Zechen Bai
Mingyu Ouyang
Peiran Li
...
Peiyi Wang
Xiangwu Guo
Hengxu Wang
Luowei Zhou
Mike Zheng Shou
LLMAG
23
21
0
20 Dec 2023
UINav: A Practical Approach to Train On-Device Automation Agents
UINav: A Practical Approach to Train On-Device Automation Agents
Wei Li
Fu-Lin Hsu
Will Bishop
Folawiyo Campbell-Ajala
Max Lin
Oriana Riva
6
3
0
15 Dec 2023
Idea2Img: Iterative Self-Refinement with GPT-4V(ision) for Automatic
  Image Design and Generation
Idea2Img: Iterative Self-Refinement with GPT-4V(ision) for Automatic Image Design and Generation
Zhengyuan Yang
Jianfeng Wang
Linjie Li
Kevin Qinghong Lin
Chung-Ching Lin
Zicheng Liu
Lijuan Wang
LRM
MLLM
DiffM
13
22
0
12 Oct 2023
You Only Look at Screens: Multimodal Chain-of-Action Agents
You Only Look at Screens: Multimodal Chain-of-Action Agents
Zhuosheng Zhang
Aston Zhang
LLMAG
LM&Ro
13
98
0
20 Sep 2023
Multimodal Foundation Models: From Specialists to General-Purpose
  Assistants
Multimodal Foundation Models: From Specialists to General-Purpose Assistants
Chunyuan Li
Zhe Gan
Zhengyuan Yang
Jianwei Yang
Linjie Li
Lijuan Wang
Jianfeng Gao
MLLM
110
226
0
18 Sep 2023
ReAct: Synergizing Reasoning and Acting in Language Models
ReAct: Synergizing Reasoning and Acting in Language Models
Shunyu Yao
Jeffrey Zhao
Dian Yu
Nan Du
Izhak Shafran
Karthik Narasimhan
Yuan Cao
LLMAG
ReLM
LRM
233
2,470
0
06 Oct 2022
Large Language Models are Zero-Shot Reasoners
Large Language Models are Zero-Shot Reasoners
Takeshi Kojima
S. Gu
Machel Reid
Yutaka Matsuo
Yusuke Iwasawa
ReLM
LRM
293
4,048
0
24 May 2022
Pix2seq: A Language Modeling Framework for Object Detection
Pix2seq: A Language Modeling Framework for Object Detection
Ting-Li Chen
Saurabh Saxena
Lala Li
David J. Fleet
Geoffrey E. Hinton
MLLM
ViT
VLM
233
344
0
22 Sep 2021
Screen Recognition: Creating Accessibility Metadata for Mobile
  Applications from Pixels
Screen Recognition: Creating Accessibility Metadata for Mobile Applications from Pixels
Xiaoyi Zhang
Lilian de Greef
Amanda Swearngin
Samuel White
Kyle I. Murray
...
Jeffrey Nichols
Jason Wu
Chris Fleizach
Aaron Everitt
Jeffrey P. Bigham
174
166
0
13 Jan 2021
Previous
12