Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2410.18967
Cited By
Ferret-UI 2: Mastering Universal User Interface Understanding Across Platforms
24 October 2024
Zhangheng Li
Keen You
H. Zhang
Di Feng
Harsh Agrawal
Xiujun Li
Mohana Prasad Sathya Moorthy
Jeff Nichols
Y. Yang
Zhe Gan
MLLM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Ferret-UI 2: Mastering Universal User Interface Understanding Across Platforms"
14 / 14 papers shown
Title
EcoAgent: An Efficient Edge-Cloud Collaborative Multi-Agent Framework for Mobile Automation
Biao Yi
Xavier Hu
Y. Chen
Shengyu Zhang
Hongxia Yang
Fan Wu
Fei Wu
LLMAG
48
0
0
08 May 2025
PixelWeb: The First Web GUI Dataset with Pixel-Wise Labels
Qi Yang
Weichen Bi
Haiyang Shen
Y. Guo
Yun Ma
32
0
0
23 Apr 2025
Navi-plus: Managing Ambiguous GUI Navigation Tasks with Follow-up
Ziming Cheng
Zhiyuan Huang
Junting Pan
Zhaohui Hou
Mingjie Zhan
33
0
0
31 Mar 2025
A Survey of WebAgents: Towards Next-Generation AI Agents for Web Automation with Large Foundation Models
Liangbo Ning
Ziran Liang
Zhuohang Jiang
Haohao Qu
Yujuan Ding
...
Xiao Wei
Shanru Lin
Hui Liu
Philip S. Yu
Qing Li
LLMAG
LM&Ro
73
5
0
30 Mar 2025
GUI-Xplore: Empowering Generalizable GUI Agents with One Exploration
Yuchen Sun
Shanhui Zhao
Tao Yu
Hao Wen
Samith Va
Mengwei Xu
Yuanchun Li
Chongyang Zhang
LLMAG
59
0
0
22 Mar 2025
UI-Vision: A Desktop-centric GUI Benchmark for Visual Perception and Interaction
Shravan Nayak
Xiangru Jian
Kevin Qinghong Lin
Juan A. Rodriguez
Montek Kalsi
...
David Vazquez
Christopher Pal
Perouz Taslakian
Spandana Gella
Sai Rajeswar
76
0
0
19 Mar 2025
MM-Spatial: Exploring 3D Spatial Understanding in Multimodal LLMs
Erik Daxberger
Nina Wenzel
David Griffiths
Haiming Gang
Justin Lazarow
...
Kai Kang
Marcin Eichner
Y. Yang
Afshin Dehghan
Peter Grasch
72
2
0
17 Mar 2025
DeskVision: Large Scale Desktop Region Captioning for Advanced GUI Agents
Yibin Xu
Liang Yang
Hao Chen
Hua Wang
Zhi Chen
Yaohua Tang
3DV
49
0
0
14 Mar 2025
FedMABench: Benchmarking Mobile Agents on Decentralized Heterogeneous User Data
Wenhao Wang
Zijie Yu
Rui Ye
J. Zhang
S. Chen
Yanfeng Wang
FedML
40
0
0
07 Mar 2025
UFO: A Unified Approach to Fine-grained Visual Perception via Open-ended Language Interface
Hao Tang
Chenwei Xie
Haiyang Wang
Xiaoyi Bao
Tingyu Weng
Pandeng Li
Yun Zheng
Liwei Wang
ObjD
VLM
52
0
0
03 Mar 2025
OmniVLM: A Token-Compressed, Sub-Billion-Parameter Vision-Language Model for Efficient On-Device Inference
Wei Chen
Zhiyuan Li
Shuo Xin
VLM
MLLM
75
0
0
16 Dec 2024
Falcon-UI: Understanding GUI Before Following User Instructions
Huawen Shen
Chang-Shu Liu
Gengluo Li
Xinlong Wang
Yu Zhou
Can Ma
Xiangyang Ji
LLMAG
74
4
0
12 Dec 2024
ShowUI: One Vision-Language-Action Model for GUI Visual Agent
Kevin Qinghong Lin
Linjie Li
Difei Gao
Z. Yang
Shiwei Wu
Zechen Bai
Weixian Lei
Lijuan Wang
Mike Zheng Shou
LLMAG
64
13
0
26 Nov 2024
Navigating the Digital World as Humans Do: Universal Visual Grounding for GUI Agents
Boyu Gou
Ruohan Wang
Boyuan Zheng
Yanan Xie
Cheng Chang
Yiheng Shu
Huan Sun
Yu Su
LM&Ro
LLMAG
53
48
0
07 Oct 2024
1