Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2409.08264
Cited By
Windows Agent Arena: Evaluating Multi-Modal OS Agents at Scale
12 September 2024
Rogerio Bonatti
Dan Zhao
Francesco Bonacci
Dillon Dupont
Sara Abdali
Yinheng Li
Yadong Lu
Justin Wagle
K. Koishida
A. Bucker
Lawrence Jang
Zack Hui
LLMAG
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Windows Agent Arena: Evaluating Multi-Modal OS Agents at Scale"
18 / 18 papers shown
Title
Open Challenges in Multi-Agent Security: Towards Secure Systems of Interacting AI Agents
Christian Schroeder de Witt
AAML
AI4CE
56
0
0
04 May 2025
UFO2: The Desktop AgentOS
Chaoyun Zhang
He Huang
Chiming Ni
J. Mu
Si Qin
...
Minghua Ma
Jian-Guang Lou
Qingwei Lin
Saravan Rajmohan
Dongmei Zhang
LLMAG
34
0
0
20 Apr 2025
InfiGUI-R1: Advancing Multimodal GUI Agents from Reactive Actors to Deliberative Reasoners
Yuhang Liu
Pengxiang Li
C. Xie
Xavier Hu
Xiaotian Han
Shengyu Zhang
Hongxia Yang
Fei Wu
LLMAG
LM&Ro
LRM
AI4CE
57
1
0
19 Apr 2025
Evaluating the Goal-Directedness of Large Language Models
Tom Everitt
Cristina Garbacea
Alexis Bellot
Jonathan G. Richens
Henry Papadatos
Simeon Campos
Rohin Shah
ELM
LM&MA
LM&Ro
LRM
68
0
0
16 Apr 2025
Kimi-VL Technical Report
Kimi Team
Angang Du
B. Yin
Bowei Xing
Bowen Qu
...
Zhiqi Huang
Zihao Huang
Zijia Zhao
Z. Chen
Zongyu Lin
MLLM
VLM
MoE
106
0
0
10 Apr 2025
V-MAGE: A Game Evaluation Framework for Assessing Visual-Centric Capabilities in Multimodal Large Language Models
Xiangxi Zheng
Linjie Li
Z. Yang
Ping Yu
Alex Jinpeng Wang
Rui Yan
Yuan Yao
Lijuan Wang
LRM
21
0
0
08 Apr 2025
Agent S2: A Compositional Generalist-Specialist Framework for Computer Use Agents
Saaket Agashe
Kyle Wong
Vincent Tu
Jiachen Yang
Ang Li
Xin Eric Wang
LLMAG
60
1
0
01 Apr 2025
UI-Vision: A Desktop-centric GUI Benchmark for Visual Perception and Interaction
Shravan Nayak
Xiangru Jian
Kevin Qinghong Lin
Juan A. Rodriguez
Montek Kalsi
...
David Vazquez
Christopher Pal
Perouz Taslakian
Spandana Gella
Sai Rajeswar
85
0
0
19 Mar 2025
Attacking Multimodal OS Agents with Malicious Image Patches
Lukas Aichberger
Alasdair Paren
Y. Gal
Philip H. S. Torr
Adel Bibi
AAML
51
2
0
13 Mar 2025
Programming with Pixels: Computer-Use Meets Software Engineering
Pranjal Aggarwal
Sean Welleck
38
0
0
24 Feb 2025
Explorer: Scaling Exploration-driven Web Trajectory Synthesis for Multimodal Web Agents
Vardaan Pahuja
Yadong Lu
Corby Rosset
Boyu Gou
Arindam Mitra
Spencer Whitehead
Yu Su
Ahmed Awadallah
LLMAG
LM&Ro
Presented at
ResearchTrend Connect | LLMAG
on
14 Mar 2025
145
3
1
20 Feb 2025
Visual Large Language Models for Generalized and Specialized Applications
Yifan Li
Zhixin Lai
Wentao Bao
Zhen Tan
Anh Dao
Kewei Sui
Jiayi Shen
Dong Liu
Huan Liu
Yu Kong
VLM
83
10
0
06 Jan 2025
PC Agent: While You Sleep, AI Works -- A Cognitive Journey into Digital World
Yanheng He
Jiahe Jin
Shijie Xia
Jiadi Su
Runze Fan
Haoyang Zou
Xiangkun Hu
Pengfei Liu
LLMAG
38
2
0
23 Dec 2024
TheAgentCompany: Benchmarking LLM Agents on Consequential Real World Tasks
Frank F. Xu
Yufan Song
Boxuan Li
Yuxuan Tang
Kritanjali Jain
...
Wayne Chi
Lawrence Jang
Yiqing Xie
Shuyan Zhou
Graham Neubig
LLMAG
124
20
0
18 Dec 2024
Ferret-UI 2: Mastering Universal User Interface Understanding Across Platforms
Zhangheng Li
Keen You
H. Zhang
Di Feng
Harsh Agrawal
Xiujun Li
Mohana Prasad Sathya Moorthy
Jeff Nichols
Y. Yang
Zhe Gan
MLLM
46
18
0
24 Oct 2024
ShapefileGPT: A Multi-Agent Large Language Model Framework for Automated Shapefile Processing
Qingming Lin
Rui Hu
Huaxia Li
Sensen Wu
Yadong Li
Kai Fang
Hailin Feng
Zhenhong Du
Liuchang Xu
LLMAG
AI4CE
25
2
0
16 Oct 2024
AndroidWorld: A Dynamic Benchmarking Environment for Autonomous Agents
Christopher Rawles
Sarah Clinckemaillie
Yifan Chang
Jonathan Waltz
Gabrielle Lau
...
Daniel Toyama
Robert Berry
Divya Tyamagundlu
Timothy Lillicrap
Oriana Riva
LLMAG
57
44
0
23 May 2024
UFO: A UI-Focused Agent for Windows OS Interaction
Chaoyun Zhang
Liqun Li
Shilin He
Xu Zhang
Bo Qiao
...
Yu Kang
Qingwei Lin
Saravan Rajmohan
Dongmei Zhang
Qi Zhang
LLMAG
58
65
0
08 Feb 2024
1