ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2506.01952
  4. Cited By
WebChoreArena: Evaluating Web Browsing Agents on Realistic Tedious Web Tasks

WebChoreArena: Evaluating Web Browsing Agents on Realistic Tedious Web Tasks

2 June 2025
Atsuyuki Miyai
Zaiying Zhao
Kazuki Egashira
Atsuki Sato
Tatsumi Sunada
Shota Onohara
Hiromasa Yamanishi
Mashiro Toyooka
Kunato Nishina
Ryoma Maeda
Kiyoharu Aizawa
Toshihiko Yamasaki
    LLMAG
ArXiv (abs)PDFHTMLHuggingFace (10 upvotes)

Papers citing "WebChoreArena: Evaluating Web Browsing Agents on Realistic Tedious Web Tasks"

7 / 7 papers shown
Title
WebGen-V Bench: Structured Representation for Enhancing Visual Design in LLM-based Web Generation and Evaluation
WebGen-V Bench: Structured Representation for Enhancing Visual Design in LLM-based Web Generation and Evaluation
Kuang-Da Wang
Zhao Wang
Yotaro Shimose
Wei-Yao Wang
Shingo Takamatsu
3DV
84
0
0
17 Oct 2025
Limited-Angle Tomography Reconstruction via Projector Guided 3D Diffusion
Limited-Angle Tomography Reconstruction via Projector Guided 3D Diffusion
Zhantao Deng
Mériem Er-Rafik
Anna Sushko
C. Hébert
Pascal Fua
DiffMMedIm
100
0
0
07 Oct 2025
BrowserArena: Evaluating LLM Agents on Real-World Web Navigation Tasks
BrowserArena: Evaluating LLM Agents on Real-World Web Navigation Tasks
Sagnik Anupam
Davis Brown
Shuo Li
Eric Wong
Hamed Hassani
Osbert Bastani
LLMAGELM
191
1
0
02 Oct 2025
ReasoningBank: Scaling Agent Self-Evolving with Reasoning Memory
ReasoningBank: Scaling Agent Self-Evolving with Reasoning Memory
Siru Ouyang
Jun Yan
I-Hung Hsu
Yanfei Chen
Ke Jiang
...
Mahsan Rofouei
Hangfei Lin
Jiawei Han
Chen-Yu Lee
Tomas Pfister
LLMAGCLLLRM
128
8
0
29 Sep 2025
ComputerRL: Scaling End-to-End Online Reinforcement Learning for Computer Use Agents
ComputerRL: Scaling End-to-End Online Reinforcement Learning for Computer Use Agents
Hanyu Lai
Xiao-Chang Liu
Yanxiao Zhao
Han Xu
Hanchen Zhang
Bohao Jing
Yanyu Ren
Shuntian Yao
Yuxiao Dong
Jie Tang
OffRL
140
11
0
19 Aug 2025
WebMall - A Multi-Shop Benchmark for Evaluating Web Agents [Technical Report]
WebMall - A Multi-Shop Benchmark for Evaluating Web Agents [Technical Report]
Ralph Peeters
Aaron Steiner
Luca Schwarz
Julian Yuya Caspary
Christian Bizer
152
2
0
18 Aug 2025
ArtifactsBench: Bridging the Visual-Interactive Gap in LLM Code Generation Evaluation
ArtifactsBench: Bridging the Visual-Interactive Gap in LLM Code Generation Evaluation
Chenchen Zhang
Yuhang Li
Can Xu
Jiaheng Liu
Ao Liu
...
Zenan Xu
Yuanxing Zhang
Wiggin Zhou
Chayse Zhou
Fengzong Lian
139
8
0
07 Jul 2025
1