Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2312.13108
Cited By
ASSISTGUI: Task-Oriented Desktop Graphical User Interface Automation
20 December 2023
Difei Gao
Lei Ji
Zechen Bai
Mingyu Ouyang
Peiran Li
Dongxing Mao
Qinchen Wu
Weichen Zhang
Peiyi Wang
Xiangwu Guo
Hengxu Wang
Luowei Zhou
Mike Zheng Shou
LLMAG
Re-assign community
ArXiv
PDF
HTML
Papers citing
"ASSISTGUI: Task-Oriented Desktop Graphical User Interface Automation"
10 / 10 papers shown
Title
VLM Q-Learning: Aligning Vision-Language Models for Interactive Decision-Making
Jake Grigsby
Yuke Zhu
Michael S Ryoo
Juan Carlos Niebles
OffRL
VLM
34
0
0
06 May 2025
Ferret-UI 2: Mastering Universal User Interface Understanding Across Platforms
Zhangheng Li
Keen You
H. Zhang
Di Feng
Harsh Agrawal
Xiujun Li
Mohana Prasad Sathya Moorthy
Jeff Nichols
Y. Yang
Zhe Gan
MLLM
51
18
0
24 Oct 2024
VideoGUI: A Benchmark for GUI Automation from Instructional Videos
Kevin Qinghong Lin
Linjie Li
Difei Gao
Qinchen Wu
Mingyi Yan
Zhengyuan Yang
Lijuan Wang
Mike Zheng Shou
39
10
0
14 Jun 2024
AndroidWorld: A Dynamic Benchmarking Environment for Autonomous Agents
Christopher Rawles
Sarah Clinckemaillie
Yifan Chang
Jonathan Waltz
Gabrielle Lau
...
Daniel Toyama
Robert Berry
Divya Tyamagundlu
Timothy Lillicrap
Oriana Riva
LLMAG
62
44
0
23 May 2024
Ferret-UI: Grounded Mobile UI Understanding with Multimodal LLMs
Keen You
Haotian Zhang
E. Schoop
Floris Weers
Amanda Swearngin
Jeffrey Nichols
Yinfei Yang
Zhe Gan
MLLM
39
82
0
08 Apr 2024
A Zero-Shot Language Agent for Computer Control with Structured Reflection
Tao Li
Gang Li
Zhiwei Deng
Bryan Wang
Yang Li
LM&Ro
LLMAG
54
23
0
12 Oct 2023
Pix2Struct: Screenshot Parsing as Pretraining for Visual Language Understanding
Kenton Lee
Mandar Joshi
Iulia Turc
Hexiang Hu
Fangyu Liu
Julian Martin Eisenschlos
Urvashi Khandelwal
Peter Shaw
Ming-Wei Chang
Kristina Toutanova
CLIP
VLM
158
262
0
07 Oct 2022
ReAct: Synergizing Reasoning and Acting in Language Models
Shunyu Yao
Jeffrey Zhao
Dian Yu
Nan Du
Izhak Shafran
Karthik Narasimhan
Yuan Cao
LLMAG
ReLM
LRM
233
2,470
0
06 Oct 2022
Chain-of-Thought Prompting Elicits Reasoning in Large Language Models
Jason W. Wei
Xuezhi Wang
Dale Schuurmans
Maarten Bosma
Brian Ichter
F. Xia
Ed H. Chi
Quoc Le
Denny Zhou
LM&Ro
LRM
AI4CE
ReLM
315
8,402
0
28 Jan 2022
You Only Look Once: Unified, Real-Time Object Detection
Joseph Redmon
S. Divvala
Ross B. Girshick
Ali Farhadi
ObjD
281
36,178
0
08 Jun 2015
1