Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
2506.01952
Cited By
WebChoreArena: Evaluating Web Browsing Agents on Realistic Tedious Web Tasks
2 June 2025
Atsuyuki Miyai
Zaiying Zhao
Kazuki Egashira
Atsuki Sato
Tatsumi Sunada
Shota Onohara
Hiromasa Yamanishi
Mashiro Toyooka
Kunato Nishina
Ryoma Maeda
Kiyoharu Aizawa
Toshihiko Yamasaki
LLMAG
Re-assign community
ArXiv (abs)
PDF
HTML
HuggingFace (10 upvotes)
Papers citing
"WebChoreArena: Evaluating Web Browsing Agents on Realistic Tedious Web Tasks"
7 / 7 papers shown
Title
WebGen-V Bench: Structured Representation for Enhancing Visual Design in LLM-based Web Generation and Evaluation
Kuang-Da Wang
Zhao Wang
Yotaro Shimose
Wei-Yao Wang
Shingo Takamatsu
3DV
84
0
0
17 Oct 2025
Limited-Angle Tomography Reconstruction via Projector Guided 3D Diffusion
Zhantao Deng
Mériem Er-Rafik
Anna Sushko
C. Hébert
Pascal Fua
DiffM
MedIm
100
0
0
07 Oct 2025
BrowserArena: Evaluating LLM Agents on Real-World Web Navigation Tasks
Sagnik Anupam
Davis Brown
Shuo Li
Eric Wong
Hamed Hassani
Osbert Bastani
LLMAG
ELM
191
1
0
02 Oct 2025
ReasoningBank: Scaling Agent Self-Evolving with Reasoning Memory
Siru Ouyang
Jun Yan
I-Hung Hsu
Yanfei Chen
Ke Jiang
...
Mahsan Rofouei
Hangfei Lin
Jiawei Han
Chen-Yu Lee
Tomas Pfister
LLMAG
CLL
LRM
128
8
0
29 Sep 2025
ComputerRL: Scaling End-to-End Online Reinforcement Learning for Computer Use Agents
Hanyu Lai
Xiao-Chang Liu
Yanxiao Zhao
Han Xu
Hanchen Zhang
Bohao Jing
Yanyu Ren
Shuntian Yao
Yuxiao Dong
Jie Tang
OffRL
140
11
0
19 Aug 2025
WebMall - A Multi-Shop Benchmark for Evaluating Web Agents [Technical Report]
Ralph Peeters
Aaron Steiner
Luca Schwarz
Julian Yuya Caspary
Christian Bizer
152
2
0
18 Aug 2025
ArtifactsBench: Bridging the Visual-Interactive Gap in LLM Code Generation Evaluation
Chenchen Zhang
Yuhang Li
Can Xu
Jiaheng Liu
Ao Liu
...
Zenan Xu
Yuanxing Zhang
Wiggin Zhou
Chayse Zhou
Fengzong Lian
139
8
0
07 Jul 2025
1