Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2404.05955
Cited By
VisualWebBench: How Far Have Multimodal LLMs Evolved in Web Page Understanding and Grounding?
9 April 2024
Junpeng Liu
Yifan Song
Bill Yuchen Lin
Wai Lam
Graham Neubig
Yuanzhi Li
Xiang Yue
VLM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"VisualWebBench: How Far Have Multimodal LLMs Evolved in Web Page Understanding and Grounding?"
10 / 10 papers shown
Title
VisEscape: A Benchmark for Evaluating Exploration-driven Decision-making in Virtual Escape Rooms
Seungwon Lim
Sungwoong Kim
Jihwan Yu
Sungjae Lee
Jiwan Chung
Youngjae Yu
54
1
0
18 Mar 2025
Magma: A Foundation Model for Multimodal AI Agents
Jianwei Yang
Reuben Tan
Qianhui Wu
Ruijie Zheng
Baolin Peng
...
Seonghyeon Ye
Joel Jang
Yuquan Deng
Lars Liden
Jianfeng Gao
VLM
AI4TS
89
8
0
18 Feb 2025
Ferret-UI 2: Mastering Universal User Interface Understanding Across Platforms
Zhangheng Li
Keen You
H. Zhang
Di Feng
Harsh Agrawal
Xiujun Li
Mohana Prasad Sathya Moorthy
Jeff Nichols
Y. Yang
Zhe Gan
MLLM
29
18
0
24 Oct 2024
Beyond Browsing: API-Based Web Agents
Yueqi Song
Frank F. Xu
Shuyan Zhou
Graham Neubig
29
13
0
21 Oct 2024
LMMs-Eval: Reality Check on the Evaluation of Large Multimodal Models
Kaichen Zhang
Bo Li
Peiyuan Zhang
Fanyi Pu
Joshua Adrian Cahyono
...
Shuai Liu
Yuanhan Zhang
Jingkang Yang
Chunyuan Li
Ziwei Liu
79
73
0
17 Jul 2024
SeeClick: Harnessing GUI Grounding for Advanced Visual GUI Agents
Kanzhi Cheng
Qiushi Sun
Yougang Chu
Fangzhi Xu
Yantao Li
Jianbing Zhang
Zhiyong Wu
LLMAG
162
137
0
17 Jan 2024
CogAgent: A Visual Language Model for GUI Agents
Wenyi Hong
Weihan Wang
Qingsong Lv
Jiazheng Xu
Wenmeng Yu
...
Juanzi Li
Bin Xu
Yuxiao Dong
Ming Ding
Jie Tang
MLLM
132
310
0
14 Dec 2023
mPLUG-Owl2: Revolutionizing Multi-modal Large Language Model with Modality Collaboration
Qinghao Ye
Haiyang Xu
Jiabo Ye
Mingshi Yan
Anwen Hu
Haowei Liu
Qi Qian
Ji Zhang
Fei Huang
Jingren Zhou
MLLM
VLM
114
367
0
07 Nov 2023
BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models
Junnan Li
Dongxu Li
Silvio Savarese
Steven C. H. Hoi
VLM
MLLM
244
4,186
0
30 Jan 2023
Screen2Words: Automatic Mobile UI Summarization with Multimodal Learning
Bryan Wang
Gang Li
Xin Zhou
Zhourong Chen
Tovi Grossman
Yang Li
150
152
0
07 Aug 2021
1