Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
2504.10445
Cited By
v1
v2 (latest)
RealWebAssist: A Benchmark for Long-Horizon Web Assistance with Real-World Users
14 April 2025
Suyu Ye
Haojun Shi
Darren Shih
Hyokun Yun
Tanya Roosta
Tianmin Shu
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"RealWebAssist: A Benchmark for Long-Horizon Web Assistance with Real-World Users"
10 / 10 papers shown
Evaluating Long-Context Reasoning in LLM-Based WebAgents
Andy Chung
Yichi Zhang
Kaixiang Lin
Aditya Rawal
Qiaozi Gao
Joyce Chai
LLMAG
LRM
120
1
0
03 Dec 2025
OpenApps: Simulating Environment Variations to Measure UI-Agent Reliability
Karen Ullrich
Jingtong Su
Claudia Shi
Arjun Subramonian
Amir Bar
Ivan Evtimov
Nikolaos Tsilivis
Randall Balestriero
Julia Kempe
Mark Ibrahim
117
0
0
25 Nov 2025
Can Agent Conquer Web? Exploring the Frontiers of ChatGPT Atlas Agent in Web Games
Jingran Zhang
Ning Li
Justin Cui
LLMAG
LM&Ro
LRM
170
1
0
30 Oct 2025
ColorBench: Benchmarking Mobile Agents with Graph-Structured Framework for Complex Long-Horizon Tasks
Yuanyi Song
Heyuan Huang
Qiqiang Lin
Yin Zhao
Xiangmou Qu
...
Zhuosheng Zhang
Jun Wang
Yong Yu
Weinan Zhang
Zhaoxiang Wang
LLMAG
OffRL
125
1
0
16 Oct 2025
Interaction-Driven Browsing: A Human-in-the-Loop Conceptual Framework Informed by Human Web Browsing for Browser-Using Agents
Hyeonggeun Yun
Jinkyu Jang
152
1
0
15 Sep 2025
NatureGAIA: Pushing the Frontiers of GUI Agents with a Challenging Benchmark and High-Quality Trajectory Dataset
Zihan Zheng
Tianle Cui
Chuwen Xie
Jiahui Zhang
Jiahui Pan
Lewei He
Qianglong Chen
LLMAG
187
0
0
02 Aug 2025
Phi-Ground Tech Report: Advancing Perception in GUI Grounding
Miaosen Zhang
Ziqiang Xu
Jialiang Zhu
Qi Dai
Kai Qiu
...
Chong Luo
Tianyi Chen
Justin Wagle
Tim Franklin
Baining Guo
LRM
234
10
0
31 Jul 2025
EconWebArena: Benchmarking Autonomous Agents on Economic Tasks in Realistic Web Environments
Zefang Liu
Yinzhu Quan
207
5
0
09 Jun 2025
MARS-Bench: A Multi-turn Athletic Real-world Scenario Benchmark for Dialogue Evaluation
Chenghao Yang
Yinbo Luo
Zhoufutu Wen
Qi Chu
Tao Gong
...
Kaiyuan Zhang
Jianpeng Jiao
Ge Zhang
Wenhao Huang
Nenghai Yu
LLMAG
LRM
186
1
0
27 May 2025
VideoWebArena: Evaluating Long Context Multimodal Agents with Video Understanding Web Tasks
International Conference on Learning Representations (ICLR), 2024
Lawrence Jang
Yinheng Li
Charles Ding
Justin Lin
Paul Pu Liang
Dan Zhao
Rogerio Bonatti
K. Koishida
413
25
0
24 Oct 2024
1
Page 1 of 1