Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2408.00203
Cited By
OmniParser for Pure Vision Based GUI Agent
1 August 2024
Yadong Lu
Jianwei Yang
Yelong Shen
Ahmed Hassan Awadallah
MLLM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"OmniParser for Pure Vision Based GUI Agent"
13 / 13 papers shown
Title
Leveraging Vision-Language Models for Visual Grounding and Analysis of Automotive UI
Benjamin Raphael Ernhofer
Daniil Prokhorov
Jannica Langner
Dominik Bollmann
39
0
0
09 May 2025
G-FOCUS: Towards a Robust Method for Assessing UI Design Persuasiveness
Jaehyun Jeon
Janghan Yoon
Minsoo Kim
Sumin Shim
Yejin Choi
Hanbin Kim
Youngjae Yu
AAML
47
0
0
08 May 2025
Visual Test-time Scaling for GUI Agent Grounding
Tiange Luo
Lajanugen Logeswaran
Justin Johnson
Honglak Lee
51
0
0
01 May 2025
UFO2: The Desktop AgentOS
Chaoyun Zhang
He Huang
Chiming Ni
J. Mu
Si Qin
...
Minghua Ma
Jian-Guang Lou
Qingwei Lin
Saravan Rajmohan
Dongmei Zhang
LLMAG
34
0
0
20 Apr 2025
UI-E2I-Synth: Advancing GUI Grounding with Large-Scale Instruction Synthesis
Xinyi Liu
Xiaoyi Zhang
Ziyun Zhang
Yan Lu
39
0
0
15 Apr 2025
UI-R1: Enhancing Efficient Action Prediction of GUI Agents by Reinforcement Learning
Zhengxi Lu
Yuxiang Chai
Yaxuan Guo
Xi Yin
Liang Liu
Hao Wang
Han Xiao
Shuai Ren
Guanjing Xiong
Hao Li
LLMAG
LRM
78
10
0
27 Mar 2025
UI-Vision: A Desktop-centric GUI Benchmark for Visual Perception and Interaction
Shravan Nayak
Xiangru Jian
Kevin Qinghong Lin
Juan A. Rodriguez
Montek Kalsi
...
David Vazquez
Christopher Pal
Perouz Taslakian
Spandana Gella
Sai Rajeswar
192
0
0
19 Mar 2025
AppAgentX: Evolving GUI Agents as Proficient Smartphone Users
Wenjia Jiang
Yangyang Zhuang
Chenxi Song
Xu Yang
Chi Zhang
Chi Zhang
LLMAG
96
1
0
04 Mar 2025
Magma: A Foundation Model for Multimodal AI Agents
Jianwei Yang
Reuben Tan
Qianhui Wu
Ruijie Zheng
Baolin Peng
...
Seonghyeon Ye
Joel Jang
Yuquan Deng
Lars Liden
Jianfeng Gao
VLM
AI4TS
122
9
0
18 Feb 2025
MolParser: End-to-end Visual Recognition of Molecule Structures in the Wild
Xi Fang
Jiankun Wang
X. Cai
Shangqian Chen
Shuwen Yang
Lin Yao
Linfeng Zhang
Guolin Ke
Linfeng Zhang
Guolin Ke
50
1
0
17 Nov 2024
CogAgent: A Visual Language Model for GUI Agents
Wenyi Hong
Weihan Wang
Qingsong Lv
Jiazheng Xu
Wenmeng Yu
...
Juanzi Li
Bin Xu
Yuxiao Dong
Ming Ding
Jie Tang
MLLM
142
321
0
14 Dec 2023
BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models
Junnan Li
Dongxu Li
Silvio Savarese
Steven C. H. Hoi
VLM
MLLM
270
4,244
0
30 Jan 2023
Screen2Words: Automatic Mobile UI Summarization with Multimodal Learning
Bryan Wang
Gang Li
Xin Zhou
Zhourong Chen
Tovi Grossman
Yang Li
167
152
0
07 Aug 2021
1