ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2504.11543
  4. Cited By
REAL: Benchmarking Autonomous Agents on Deterministic Simulations of Real Websites
v1v2 (latest)

REAL: Benchmarking Autonomous Agents on Deterministic Simulations of Real Websites

15 April 2025
Divyansh Garg
Shaun VanWeelden
Diego Caples
Andis Draguns
Nikil Ravi
Pranav Putta
Naman Garg
Tomas Abraham
Michael Lara
Federico Lopez
James Liu
Atharva Gundawar
Prannay Hebbar
Youngchul Joo
Jindong Gu
Charles London
Christian Schroeder de Witt
S. Motwani
ArXiv (abs)PDFHTML

Papers citing "REAL: Benchmarking Autonomous Agents on Deterministic Simulations of Real Websites"

20 / 20 papers shown
Title
OpenApps: Simulating Environment Variations to Measure UI-Agent Reliability
OpenApps: Simulating Environment Variations to Measure UI-Agent Reliability
Karen Ullrich
Jingtong Su
Claudia Shi
Arjun Subramonian
Amir Bar
Ivan Evtimov
Nikolaos Tsilivis
Randall Balestriero
Julia Kempe
Mark Ibrahim
44
0
0
25 Nov 2025
Fara-7B: An Efficient Agentic Model for Computer Use
Fara-7B: An Efficient Agentic Model for Computer Use
Ahmed Awadallah
Yash Lara
Raghav Magazine
Hussein Mozannar
Akshay Nambi
...
Corby Rosset
Alexey Taymanov
Vibhav Vineet
Spencer Whitehead
Andrew Zhao
40
0
0
24 Nov 2025
UI-CUBE: Enterprise-Grade Computer Use Agent Benchmarking Beyond Task Accuracy to Operational Reliability
UI-CUBE: Enterprise-Grade Computer Use Agent Benchmarking Beyond Task Accuracy to Operational Reliability
Horia Cristescu
Charles Park
Trong Canh Nguyen
Sergiu Talmacel
Alexandru-Gabriel Ilie
Stefan Adam
ELM
100
0
0
21 Nov 2025
Towards Outcome-Oriented, Task-Agnostic Evaluation of AI Agents
Towards Outcome-Oriented, Task-Agnostic Evaluation of AI Agents
Waseem Alshikh
Muayad Ali
Brian Kennedy
Dmytro Mozolevskyi
28
0
0
11 Nov 2025
SCUBA: Salesforce Computer Use Benchmark
SCUBA: Salesforce Computer Use Benchmark
Yutong Dai
Krithika Ramakrishnan
Jing Gu
M. Fernández
Yanqi Luo
...
Zhenyu Hu
Silvio Savarese
Caiming Xiong
Zeyuan Chen
Ran Xu
ELM
111
1
0
30 Sep 2025
WAREX: Web Agent Reliability Evaluation on Existing Benchmarks
WAREX: Web Agent Reliability Evaluation on Existing Benchmarks
Su Kara
Fazle Faisal
Suman Nath
100
0
0
28 Sep 2025
WebMall - A Multi-Shop Benchmark for Evaluating Web Agents [Technical Report]
WebMall - A Multi-Shop Benchmark for Evaluating Web Agents [Technical Report]
Ralph Peeters
Aaron Steiner
Luca Schwarz
Julian Yuya Caspary
Christian Bizer
96
0
0
18 Aug 2025
NatureGAIA: Pushing the Frontiers of GUI Agents with a Challenging Benchmark and High-Quality Trajectory Dataset
NatureGAIA: Pushing the Frontiers of GUI Agents with a Challenging Benchmark and High-Quality Trajectory Dataset
Zihan Zheng
Tianle Cui
Chuwen Xie
Jiahui Zhang
Jiahui Pan
Lewei He
Qianglong Chen
LLMAG
152
0
0
02 Aug 2025
WebArXiv: Evaluating Multimodal Agents on Time-Invariant arXiv Tasks
WebArXiv: Evaluating Multimodal Agents on Time-Invariant arXiv Tasks
Zihao Sun
Ling Chen
LLMAG
102
0
0
01 Jul 2025
Evolutionary Perspectives on the Evaluation of LLM-Based AI Agents: A Comprehensive Survey
Evolutionary Perspectives on the Evaluation of LLM-Based AI Agents: A Comprehensive Survey
Jiachen Zhu
Menghui Zhu
Renting Rui
Rong Shan
Congmin Zheng
...
Jianghao Lin
Weiwen Liu
Ruiming Tang
Yong Yu
Weinan Zhang
LLMAGELM
210
6
0
06 Jun 2025
RedTeamCUA: Realistic Adversarial Testing of Computer-Use Agents in Hybrid Web-OS Environments
RedTeamCUA: Realistic Adversarial Testing of Computer-Use Agents in Hybrid Web-OS Environments
Zeyi Liao
Jaylen Jones
Linxi Jiang
Eric Fosler-Lussier
Eric Fosler-Lussier
Yu-Chuan Su
Zhiqiang Lin
Huan Sun
ELM
311
10
0
28 May 2025
Open Challenges in Multi-Agent Security: Towards Secure Systems of Interacting AI Agents
Open Challenges in Multi-Agent Security: Towards Secure Systems of Interacting AI Agents
Christian Schroeder de Witt
AAMLAI4CE
1.0K
29
0
04 May 2025
Survey on Evaluation of LLM-based Agents
Survey on Evaluation of LLM-based Agents
Asaf Yehudai
Lilach Eden
Alan Li
Guy Uziel
Yilun Zhao
Roy Bar-Haim
Arman Cohan
Michal Shmueli-Scheuer
LLMAGELM
413
62
0
20 Mar 2025
SWEET-RL: Training Multi-Turn LLM Agents on Collaborative Reasoning Tasks
SWEET-RL: Training Multi-Turn LLM Agents on Collaborative Reasoning Tasks
Yifei Zhou
Song Jiang
Yuandong Tian
Jason Weston
Sergey Levine
Sainbayar Sukhbaatar
Xian Li
LLMAGLRM
323
45
0
19 Mar 2025
AI Agents: Evolution, Architecture, and Real-World Applications
AI Agents: Evolution, Architecture, and Real-World Applications
Naveen Krishnan
LLMAGLM&RoAI4TSAI4CE
142
30
0
16 Mar 2025
Towards Enterprise-Ready Computer Using Generalist Agent
Towards Enterprise-Ready Computer Using Generalist Agent
Sami Marreed
Alon Oved
Avi Yaeli
Segev Shlomov
Ido Levy
Aviad Sela
Aviad Sela
Asaf Adi
Nir Mashkif
LLMAG
251
11
0
24 Feb 2025
Towards System 2 Reasoning in LLMs: Learning How to Think With Meta Chain-of-Though
Towards System 2 Reasoning in LLMs: Learning How to Think With Meta Chain-of-Though
Robert Z. Sparks
Charlie Snell
Kanishk Gandhi
Alon Albalak
Anikait Singh
...
Dakota Mahan
Louis Castricato
Jan-Philipp Fränken
Nick Haber
Chelsea Finn
LRM
298
78
0
08 Jan 2025
TheAgentCompany: Benchmarking LLM Agents on Consequential Real World Tasks
TheAgentCompany: Benchmarking LLM Agents on Consequential Real World Tasks
Frank F. Xu
Yufan Song
Boxuan Li
Yuxuan Tang
Kritanjali Jain
...
Wayne Chi
Lawrence Jang
Yiqing Xie
Shuyan Zhou
Graham Neubig
ELM
567
87
0
18 Dec 2024
Beyond Browsing: API-Based Web Agents
Beyond Browsing: API-Based Web AgentsAnnual Meeting of the Association for Computational Linguistics (ACL), 2024
Yueqi Song
Frank F. Xu
Shuyan Zhou
Graham Neubig
481
44
0
21 Oct 2024
AgentOccam: A Simple Yet Strong Baseline for LLM-Based Web Agents
AgentOccam: A Simple Yet Strong Baseline for LLM-Based Web AgentsInternational Conference on Learning Representations (ICLR), 2024
Ke Yang
Yao Liu
Sapana Chaudhary
Rasool Fakoor
Pratik Chaudhari
George Karypis
Huzefa Rangwala
LLMAGLM&Ro
438
59
0
17 Oct 2024
1