Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales

Terms and Conditions

Twitter GitHub LinkedIn Bluesky Youtube

© 2026 ResearchTrend.AI, All rights reserved.

Home
Papers
2504.10445
Cited By

RealWebAssist: A Benchmark for Long-Horizon Web Assistance with Real-World Users

v1v2 (latest)

RealWebAssist: A Benchmark for Long-Horizon Web Assistance with Real-World Users

14 April 2025

ArXiv (abs)PDF HTML

Papers citing "RealWebAssist: A Benchmark for Long-Horizon Web Assistance with Real-World Users"

10 / 10 papers shown

Evaluating Long-Context Reasoning in LLM-Based WebAgents

Evaluating Long-Context Reasoning in LLM-Based WebAgents

120

1

0

03 Dec 2025

OpenApps: Simulating Environment Variations to Measure UI-Agent Reliability

OpenApps: Simulating Environment Variations to Measure UI-Agent Reliability

Arjun Subramonian

Nikolaos Tsilivis

Randall Balestriero

117

0

0

25 Nov 2025

Can Agent Conquer Web? Exploring the Frontiers of ChatGPT Atlas Agent in Web Games

Can Agent Conquer Web? Exploring the Frontiers of ChatGPT Atlas Agent in Web Games

LLMAG LM&Ro LRM

170

1

0

30 Oct 2025

ColorBench: Benchmarking Mobile Agents with Graph-Structured Framework for Complex Long-Horizon Tasks

ColorBench: Benchmarking Mobile Agents with Graph-Structured Framework for Complex Long-Horizon Tasks

...

Zhuosheng Zhang

125

1

0

16 Oct 2025

Interaction-Driven Browsing: A Human-in-the-Loop Conceptual Framework Informed by Human Web Browsing for Browser-Using Agents

Interaction-Driven Browsing: A Human-in-the-Loop Conceptual Framework Informed by Human Web Browsing for Browser-Using Agents

152

1

0

15 Sep 2025

NatureGAIA: Pushing the Frontiers of GUI Agents with a Challenging Benchmark and High-Quality Trajectory Dataset

NatureGAIA: Pushing the Frontiers of GUI Agents with a Challenging Benchmark and High-Quality Trajectory Dataset

187

0

0

02 Aug 2025

Phi-Ground Tech Report: Advancing Perception in GUI Grounding

Phi-Ground Tech Report: Advancing Perception in GUI Grounding

...

234

10

0

31 Jul 2025

EconWebArena: Benchmarking Autonomous Agents on Economic Tasks in Realistic Web Environments

207

5

0

09 Jun 2025

MARS-Bench: A Multi-turn Athletic Real-world Scenario Benchmark for Dialogue Evaluation

MARS-Bench: A Multi-turn Athletic Real-world Scenario Benchmark for Dialogue Evaluation

...

186

1

0

27 May 2025

VideoWebArena: Evaluating Long Context Multimodal Agents with Video Understanding Web Tasks

VideoWebArena: Evaluating Long Context Multimodal Agents with Video Understanding Web TasksInternational Conference on Learning Representations (ICLR), 2024

Rogerio Bonatti

413

25

0

24 Oct 2024

Page 1 of 1