Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
2504.07981
Cited By
ScreenSpot-Pro: GUI Grounding for Professional High-Resolution Computer Use
4 April 2025
Kaixin Li
Ziyang Meng
Hongzhan Lin
Ziyang Luo
Yuchen Tian
Jing Ma
Zhiyong Huang
Tat-Seng Chua
Re-assign community
ArXiv (abs)
PDF
HTML
HuggingFace (2 upvotes)
Papers citing
"ScreenSpot-Pro: GUI Grounding for Professional High-Resolution Computer Use"
24 / 74 papers shown
Title
Phi-Ground Tech Report: Advancing Perception in GUI Grounding
Miaosen Zhang
Ziqiang Xu
Jialiang Zhu
Qi Dai
Kai Qiu
...
Chong Luo
Tianyi Chen
Justin Wagle
Tim Franklin
Baining Guo
LRM
148
8
0
31 Jul 2025
UI-AGILE: Advancing GUI Agents with Effective Reinforcement Learning and Precise Inference-Time Grounding
Shuquan Lian
Yuhang Wu
Jia Ma
Yifan Ding
Zihan Song
Bingqi Chen
Xiawu Zheng
Hui Li
LLMAG
536
10
0
29 Jul 2025
OS-MAP: How Far Can Computer-Using Agents Go in Breadth and Depth?
Xuetian Chen
Yinghao Chen
Xinfeng Yuan
Zhuo Peng
Lu Chen
...
Tianbao Xie
Zhiyong Wu
Qiushi Sun
Biqing Qi
Bowen Zhou
143
3
0
25 Jul 2025
MMBench-GUI: Hierarchical Multi-Platform Evaluation Framework for GUI Agents
X. Wang
Zhenyu Wu
JingJing Xie
Zichen Ding
Bowen Yang
...
Weijie Su
X. Zhu
Wei Shen
Jifeng Dai
Wenhai Wang
LLMAG
194
14
0
25 Jul 2025
MagicGUI: A Foundational Mobile GUI Agent with Scalable Data Pipeline and Reinforcement Fine-tuning
Liujian Tang
Shaokang Dong
Y. Huang
Minqi Xiang
Hongtao Ruan
...
Qi Zhang
Kang Wang
Y. Zhang
Y. Wang
Yuran Wang
LM&Ro
301
6
0
19 Jul 2025
GTA1: GUI Test-time Scaling Agent
Yan Yang
Dongxu Li
Yutong Dai
Yuhao Yang
Ziyang Luo
...
Ran Xu
Liyuan Pan
Silvio Savarese
Caiming Xiong
Junnan Li
LLMAG
342
33
0
08 Jul 2025
MiMo-VL Technical Report
Xiaomi LLM-Core Team
Zihao Yue
Zhenru Lin
Yifan Song
Weikun Wang
...
Di Zhang
Chong Ma
Chang Liu
Can Cai
Bingquan Xia
OffRL
MoE
VLM
LRM
211
10
0
04 Jun 2025
GUI-Actor: Coordinate-Free Visual Grounding for GUI Agents
Qianhui Wu
Kanzhi Cheng
Rui Yang
Chaoyun Zhang
Jianwei Yang
...
Huan Zhang
Tong Zhang
Jianbing Zhang
Dongmei Zhang
J. Gao
LM&Ro
220
30
0
03 Jun 2025
Surfer-H Meets Holo1: Cost-Efficient Web Agent Powered by Open Weights
M. Andreux
Breno Baldas Skuk
Hamza Benchekroun
Emilien Biré
Antoine Bonnet
...
Marc Thibault
L. Thiry
Léo Tronchon
Nicolas Usunier
Tony Wu
LLMAG
164
0
0
03 Jun 2025
AgentCPM-GUI: Building Mobile-Use Agents with Reinforcement Fine-Tuning
Zhong Zhang
Yaxi Lu
Yikun Fu
Yupeng Huo
Shenzhi Yang
...
Chongyi Wang
Chi Chen
Yuan Yao
Zhiyuan Liu
Maosong Sun
LLMAG
ALM
263
16
0
02 Jun 2025
MMTBENCH: A Unified Benchmark for Complex Multimodal Table Reasoning
Prasham Yatinkumar Titiya
Jainil Trivedi
Chitta Baral
Vivek Gupta
LMTD
210
3
0
27 May 2025
One RL to See Them All: Visual Triple Unified Reinforcement Learning
Yan Ma
Linge Du
Xuyang Shen
Shaoxiang Chen
Pengfei Li
Qibing Ren
Lizhuang Ma
Yuchao Dai
Pengfei Liu
Junjie Yan
OffRL
LRM
316
0
0
23 May 2025
ReGUIDE: Data Efficient GUI Grounding via Spatial Reasoning and Search
Hyunseok Lee
Jeonghoon Kim
Beomjun Kim
Jihoon Tack
Chansong Jo
Jaehong Lee
Cheonbok Park
Sookyo In
Jinwoo Shin
Kang Min Yoo
332
5
0
21 May 2025
Enhancing Visual Grounding for GUI Agents via Self-Evolutionary Reinforcement Learning
Xinbin Yuan
Jian Zhang
K. Li
Zhuoxuan Cai
Lujian Yao
...
Enguang Wang
Qibin Hou
Jinwei Chen
Peng-Tao Jiang
Bo Li
371
29
0
18 May 2025
GUI-Shift: Enhancing VLM-Based GUI Agents through Self-supervised Reinforcement Learning
Longxi Gao
Li Zhang
Mengwei Xu
Wei Liu
Jian Luan
Mengwei Xu
333
4
0
18 May 2025
Knowledge Augmented Complex Problem Solving with Large Language Models: A Survey
Da Zheng
Lun Du
Junwei Su
Yuchen Tian
Yuqi Zhu
Jintian Zhang
Lanning Wei
Xin Xu
Ningyu Zhang
LRM
441
5
0
06 May 2025
Visual Test-time Scaling for GUI Agent Grounding
Tiange Luo
Lajanugen Logeswaran
Justin Johnson
Honglak Lee
297
10
0
01 May 2025
InfiGUI-R1: Advancing Multimodal GUI Agents from Reactive Actors to Deliberative Reasoners
Yuhang Liu
Pengxiang Li
C. Xie
Xavier Hu
Xiaotian Han
Shengyu Zhang
Hongxia Yang
Fei Wu
LLMAG
LM&Ro
LRM
AI4CE
326
63
0
19 Apr 2025
TongUI: Internet-Scale Trajectories from Multimodal Web Tutorials for Generalized GUI Agents
Bofei Zhang
Zirui Shang
Zhi Gao
Wang Zhang
Rui Xie
Xiaojian Ma
Tao Yuan
Xinxiao Wu
Song-Chun Zhu
Qing Li
LLMAG
308
21
0
17 Apr 2025
GUI-R1 : A Generalist R1-Style Vision-Language Action Model For GUI Agents
Run Luo
Lu Wang
Wanwei He
Longze Chen
Jiaming Li
Xiaobo Xia
LLMAG
628
120
0
14 Apr 2025
Kimi-VL Technical Report
Kimi Team
Angang Du
B. Yin
Bowei Xing
Bowen Qu
...
Longxiang Zhang
Zhe Chen
Zijia Zhao
Ziwei Chen
Zongyu Lin
MLLM
VLM
MoE
804
125
0
10 Apr 2025
A Survey of WebAgents: Towards Next-Generation AI Agents for Web Automation with Large Foundation Models
Liangbo Ning
Ziran Liang
Zhuohang Jiang
Haohao Qu
Yujuan Ding
...
Xiao Wei
Shanru Lin
Hui Liu
Philip S. Yu
Qing Li
LLMAG
LM&Ro
479
44
0
30 Mar 2025
Multimodal Large Language Models for Text-rich Image Understanding: A Comprehensive Review
Annual Meeting of the Association for Computational Linguistics (ACL), 2025
Pei Fu
Tongkun Guan
Zining Wang
Zhentao Guo
Chen Duan
...
Boming Chen
Jiayao Ma
Qianyi Jiang
Kai Zhou
Junfeng Luo
VLM
367
1
0
23 Feb 2025
VLMEvalKit: An Open-Source Toolkit for Evaluating Large Multi-Modality Models
Haodong Duan
Xinyu Fang
Junming Yang
Xiangyu Zhao
Lin Chen
...
Yuhang Zang
Pan Zhang
Jiaqi Wang
Dahua Lin
Kai Chen
LM&MA
VLM
642
332
0
16 Jul 2024
Previous
1
2