Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
2504.07981
Cited By
ScreenSpot-Pro: GUI Grounding for Professional High-Resolution Computer Use
4 April 2025
Kaixin Li
Ziyang Meng
Hongzhan Lin
Ziyang Luo
Yuchen Tian
Jing Ma
Zhiyong Huang
Tat-Seng Chua
Re-assign community
ArXiv (abs)
PDF
HTML
HuggingFace (2 upvotes)
Papers citing
"ScreenSpot-Pro: GUI Grounding for Professional High-Resolution Computer Use"
50 / 74 papers shown
Title
Fara-7B: An Efficient Agentic Model for Computer Use
Ahmed Awadallah
Yash Lara
Raghav Magazine
Hussein Mozannar
Akshay Nambi
...
Corby Rosset
Alexey Taymanov
Vibhav Vineet
Spencer Whitehead
Andrew Zhao
24
0
0
24 Nov 2025
Improved Sample Complexity for Full Coverage in Compact and Continuous Spaces
Lyu Yuhuan
124
3
0
21 Nov 2025
D-GARA: A Dynamic Benchmarking Framework for GUI Agent Robustness in Real-World Anomalies
S. Chen
Tong Zhao
Yi Bin
Fei Ma
Wenqi Shao
Z. Wang
77
0
0
20 Nov 2025
MEGA-GUI: Multi-stage Enhanced Grounding Agents for GUI Elements
SeokJoo Kwak
Jihoon Kim
Boyoun Kim
Jung Jae Yoon
Wooseok Jang
Jeonghoon Hong
Jaeho Yang
Yeong-Dae Kwon
100
0
0
17 Nov 2025
MM-CRITIC: A Holistic Evaluation of Large Multimodal Models as Multimodal Critique
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2025
Gailun Zeng
Ziyang Luo
Hongzhan Lin
Yuchen Tian
Kaixin Li
Ziyang Gong
Jianxiong Guo
Jing Ma
64
1
0
12 Nov 2025
An Efficient Training Pipeline for Reasoning Graphical User Interface Agents
Georgios Pantazopoulos
Eda B. Özyiğit
LRM
246
0
0
11 Nov 2025
NVIDIA Nemotron Nano V2 VL
Nvidia
Amala Sanjay Deshmukh
Kateryna Chumachenko
Tuomas Rintamaki
Matthieu Le
...
Krzysztof Pawelec
Michael Evans
Katherine Luna
Jie Lou
Erick Galinkin
VLM
232
1
0
06 Nov 2025
GUI-360
∘
^\circ
∘
: A Comprehensive Dataset and Benchmark for Computer-Using Agents
J. Mu
Chaoyun Zhang
Chiming Ni
Lu Wang
Bo Qiao
...
Yu Kang
Minghua Ma
Qingwei Lin
Saravan Rajmohan
Dongmei Zhang
ELM
271
0
0
06 Nov 2025
GUI-AIMA: Aligning Intrinsic Multimodal Attention with a Context Anchor for GUI Grounding
Shijie Zhou
Viet Dac Lai
Hao Tan
Jihyung Kil
Wanrong Zhu
Changyou Chen
Ruiyi Zhang
114
1
0
02 Nov 2025
HyperClick: Advancing Reliable GUI Grounding via Uncertainty Calibration
Shaojie Zhang
Pei Fu
Ruoceng Zhang
Jiahui Yang
Anan Du
...
S. Wang
Ying Huang
Bin Qin
Zhenbo Luo
Jian Luan
80
0
0
31 Oct 2025
GUI Knowledge Bench: Revealing the Knowledge Gap Behind VLM Failures in GUI Tasks
Chenrui Shi
Zedong YU
Zhi Gao
Ruining Feng
Enqi Liu
Yuwei Wu
Yunde Jia
Liuyu Xiang
Zhaofeng He
Qing Li
77
0
0
30 Oct 2025
Are Video Models Ready as Zero-Shot Reasoners? An Empirical Study with the MME-CoF Benchmark
Ziyu Guo
Xinyan Chen
Renrui Zhang
Ruichuan An
Yu Qi
Dongzhi Jiang
Xiangtai Li
M. Zhang
Jiaming Song
Pheng-Ann Heng
VGen
LRM
112
7
0
30 Oct 2025
MGA: Memory-Driven GUI Agent for Observation-Centric Interaction
Weihua Cheng
Ersheng Ni
Wenlong Wang
Yifei Sun
Junming Liu
Wangyu Shen
Yirong Chen
Botian Shi
Ding Wang
LLMAG
LM&Ro
197
0
0
28 Oct 2025
Game-TARS: Pretrained Foundation Models for Scalable Generalist Multimodal Game Agents
Zihao Wang
X. Li
Yining Ye
Junjie Fang
Haoming Wang
...
Shi Yan
Xiangyang Li
Yitao Liang
Yujia Qin
Guang Shi
LLMAG
LM&Ro
AI4CE
116
2
0
27 Oct 2025
Mitigating Coordinate Prediction Bias from Positional Encoding Failures
Xingjian Tao
Yiwei Wang
Yujun Cai
Yihong Luo
Jing Tang
76
0
0
25 Oct 2025
UI-Ins: Enhancing GUI Grounding with Multi-Perspective Instruction-as-Reasoning
Liangyu Chen
Zhengyu Ma
C. Cai
J. Zhang
Panrong Tong
...
Yuqi Liu
Wenxuan Wang
Yue Wang
Qin Jin
Steven C. H. Hoi
LRM
72
1
0
23 Oct 2025
A Coherence-Based Measure of AGI
Fares Fourati
61
0
0
23 Oct 2025
DaMo: Data Mixing Optimizer in Fine-tuning Multimodal LLMs for Mobile Phone Agents
Kai Shi
Jun Yang
Ni Yang
Binqiang Pan
Qingsong Xie
Chao Zhang
ZhenYu Yang
T. Su
Haonan Lu
72
0
0
22 Oct 2025
FineVision: Open Data Is All You Need
Luis Wiedmann
Orr Zohar
Amir Mahla
Xiaohan Wang
Rui Li
Thibaud Frere
Leandro von Werra
Aritra Roy Gosthipaty
Andrés Marafioti
VLM
148
10
0
20 Oct 2025
GUIrilla: A Scalable Framework for Automated Desktop UI Exploration
Sofiya Garkot
Maksym Shamrai
Ivan Synytsia
Mariya Hirna
LLMAG
87
0
0
16 Oct 2025
Detect Anything via Next Point Prediction
Qing Jiang
Junan Huo
Xingyu Chen
Yuda Xiong
Zhaoyang Zeng
Yihao Chen
Tianhe Ren
Junzhi Yu
Lei Zhang
ObjD
162
9
0
14 Oct 2025
A Survey on Agentic Multimodal Large Language Models
Huanjin Yao
Ruifei Zhang
Jiaxing Huang
Jingyi Zhang
Yibo Wang
...
Ruolin Zhu
Yongcheng Jing
Shunyu Liu
Guanbin Li
Dacheng Tao
LM&Ro
AIFin
AI4TS
LRM
AI4CE
165
4
0
13 Oct 2025
ReLook: Vision-Grounded RL with a Multimodal LLM Critic for Agentic Web Coding
Yuhang Li
Chenchen Zhang
Ruilin Lv
Ao Liu
K. Deng
Yuanxing Zhang
Jiaheng Liu
Wiggin Zhou
B. Zhou
LRM
55
2
0
13 Oct 2025
Auto-scaling Continuous Memory for GUI Agent
Wenyi Wu
Kun Zhou
Ruoxin Yuan
Vivian Yu
S. Wang
Zhiting Hu
Biwei Huang
60
0
0
10 Oct 2025
LLM-Based Data Science Agents: A Survey of Capabilities, Challenges, and Future Directions
Mizanur Rahman
Amran Bhuiyan
Mohammed Saidul Islam
Md Tahmid Rahman Laskar
Ridwan Mahbub
Ahmed Masry
Shafiq Joty
Enamul Hoque
LLMAG
AI4TS
LM&Ro
AI4CE
89
1
0
05 Oct 2025
\textsc{GUI-Spotlight}: Adaptive Iterative Focus Refinement for Enhanced GUI Visual Grounding
Bin Lei
Nuo Xu
Ali Payani
Mingyi Hong
C. Liao
Yu Cao
Caiwen Ding
52
1
0
05 Oct 2025
Improving GUI Grounding with Explicit Position-to-Coordinate Mapping
Suyuchen Wang
Tianyu Zhang
Ahmed Masry
Christopher Pal
Spandana Gella
Bang Liu
Perouz Taslakian
80
1
0
03 Oct 2025
GUI-KV: Efficient GUI Agents via KV Cache with Spatio-Temporal Awareness
Kung-Hsiang Huang
Haoyi Qiu
Yutong Dai
Caiming Xiong
Chien-Sheng Wu
94
1
0
01 Oct 2025
Ferret-UI Lite: Lessons from Building Small On-Device GUI Agents
Zhen Yang
Zi-Yi Dou
Di Feng
Forrest Huang
Anh Nguyen
...
Chao Jia
Jeffrey Nichols
Alexander Toshev
Yinfei Yang
Zhe Gan
LLMAG
79
2
0
30 Sep 2025
RISK: A Framework for GUI Agents in E-commerce Risk Management
Renqi Chen
Zeyin Tao
Jianming Guo
Jingzhe Zhu
Yiheng Peng
Qingqing Sun
Tianyi Zhang
Shuai Chen
84
0
0
26 Sep 2025
Orcust: Stepwise-Feedback Reinforcement Learning for GUI Agent
Junyu Lu
Songxin Zhang
Zejian Xie
Zhuoyang Song
Jiaxing Zhang
OffRL
LRM
54
0
0
22 Sep 2025
GUI-ARP: Enhancing Grounding with Adaptive Region Perception for GUI Agents
Xianhang Ye
Yiqing Li
Wei Dai
Miancan Liu
Ziyuan Chen
...
Hongbo Min
Jinkui Ren
Xiantao Zhang
Wen Yang
Zhi Jin
96
3
0
19 Sep 2025
BTL-UI: Blink-Think-Link Reasoning Model for GUI Agent
Shaojie Zhang
Ruoceng Zhang
Pei Fu
S. Wang
Jiahui Yang
...
Shiqi Cui
Bin Qin
Ying Huang
Zhenbo Luo
Jian Luan
LLMAG
MLLM
179
2
0
19 Sep 2025
InfraMind: A Novel Exploration-based GUI Agentic Framework for Mission-critical Industrial Management
Liangtao Lin
Zhaomeng Zhu
Tianwei Zhang
Yonggang Wen
AI4CE
113
2
0
17 Sep 2025
UI-S1: Advancing GUI Automation via Semi-online Reinforcement Learning
Zhengxi Lu
Jiabo Ye
Fei Tang
Yongliang Shen
Haiyang Xu
...
Weiming Lu
Ming Yan
Fei Huang
Jun Xiao
Yueting Zhuang
OffRL
OnRL
342
3
0
15 Sep 2025
How Auxiliary Reasoning Unleashes GUI Grounding in VLMs
Weiming Li
Yan Shao
Jing Yang
Yujing Lu
Ling Zhong
Y. Wang
Manni Duan
95
0
0
15 Sep 2025
Towards Understanding Visual Grounding in Visual Language Models
Georgios Pantazopoulos
Eda B. Özyiğit
ObjD
208
1
0
12 Sep 2025
Learning Active Perception via Self-Evolving Preference Optimization for GUI Grounding
Wanfu Wang
Qipeng Huang
Guangquan Xue
Xiaobo Liang
Juntao Li
VLM
80
1
0
04 Sep 2025
MobiAgent: A Systematic Framework for Customizable Mobile Agents
Cheng Zhang
Erhu Feng
Xi Zhao
Yisheng Zhao
Wangbo Gong
Jiahui Sun
Dong Du
Zhichao Hua
Yubin Xia
Haibo Chen
92
2
0
30 Aug 2025
InquireMobile: Teaching VLM-based Mobile Agent to Request Human Assistance via Reinforcement Fine-Tuning
Qihang Ai
Pi Bu
Yue Cao
Y. X. R. Wang
Jihao Gu
...
Wei Jiang
Zhicheng Zheng
Jun Song
Yuning Jiang
Bo Zheng
LRM
74
1
0
27 Aug 2025
DashboardQA: Benchmarking Multimodal Agents for Question Answering on Interactive Dashboards
Aaryaman Kartha
Ahmed Masry
Mohammed Saidul Islam
Thinh Lang
Shadikur Rahman
...
Mizanur Rahman
Mahir Ahmed
Md. Rizwan Parvez
Enamul Hoque
Shafiq Joty
60
0
0
24 Aug 2025
MCP-Universe: Benchmarking Large Language Models with Real-World Model Context Protocol Servers
Ziyang Luo
Zhiqi Shen
Wenzhuo Yang
Zirui Zhao
Prathyusha Jwalapuram
Amrita Saha
Doyen Sahoo
Silvio Savarese
Caiming Xiong
Junnan Li
ELM
140
20
0
20 Aug 2025
V2P: From Background Suppression to Center Peaking for Robust GUI Grounding Task
Jikai Chen
Long Chen
Dong Wang
Leilei Gan
Chenyi Zhuang
Jinjie Gu
69
1
0
19 Aug 2025
OpenCUA: Open Foundations for Computer-Use Agents
Xinyuan Wang
Bowen Wang
Dunjie Lu
Junlin Yang
Tianbao Xie
...
Victor Zhong
Flood Sung
Y.Charles
Zhilin Yang
Tao Yu
ELM
VLM
194
22
0
12 Aug 2025
Reinforcement Learning in Vision: A Survey
Weijia Wu
Chen Gao
Joya Chen
Kevin Lin
Qingwei Meng
Yiming Zhang
Yuke Qiu
Hong Zhou
Mike Zheng Shou
228
2
0
11 Aug 2025
SEAgent: Self-Evolving Computer Use Agent with Autonomous Learning from Experience
Zeyi Sun
Ziyu Liu
Yuhang Zang
Yuhang Cao
Xiaoyi Dong
Tong Wu
Dahua Lin
Yuan Liu
LLMAG
199
11
0
06 Aug 2025
SEA: Self-Evolution Agent with Step-wise Reward for Computer Use
Liang Tang
Shuxian Li
Yuhao Cheng
Yukang Huo
Zhepeng Wang
Yiqiang Yan
Kaer Huang
Yanzhe Jing
Tiaonan Duan
166
6
0
06 Aug 2025
GuirlVG: Incentivize GUI Visual Grounding via Empirical Exploration on Reinforcement Learning
Weitai Kang
Bin Lei
Gaowen Liu
Caiwen Ding
Yan Yan
105
1
0
06 Aug 2025
CoAct-1: Computer-using Agents with Coding as Actions
Linxin Song
Yutong Dai
Viraj Prabhu
Jieyu Zhang
Taiwei Shi
...
Silvio Savarese
Zeyuan Chen
Jieyu Zhao
Ran Xu
Caiming Xiong
LLMAG
92
13
0
05 Aug 2025
NatureGAIA: Pushing the Frontiers of GUI Agents with a Challenging Benchmark and High-Quality Trajectory Dataset
Zihan Zheng
Tianle Cui
Chuwen Xie
Jiahui Zhang
Jiahui Pan
Lewei He
Qianglong Chen
LLMAG
152
0
0
02 Aug 2025
1
2
Next