ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2404.06474
  4. Cited By
Autonomous Evaluation and Refinement of Digital Agents

Autonomous Evaluation and Refinement of Digital Agents

9 April 2024
Jiayi Pan
Yichi Zhang
Nicholas Tomlin
Yifei Zhou
Sergey Levine
Alane Suhr
    ELM
ArXivPDFHTML

Papers citing "Autonomous Evaluation and Refinement of Digital Agents"

19 / 19 papers shown
Title
A Survey on Large Language Model based Human-Agent Systems
A Survey on Large Language Model based Human-Agent Systems
Henry Peng Zou
Wei-Chieh Huang
Yaozu Wu
Yankai Chen
Chunyu Miao
...
Y. Li
Yuwei Cao
Dongyuan Li
Renhe Jiang
Philip S. Yu
LLMAG
LM&Ro
LM&MA
79
0
0
01 May 2025
AndroidGen: Building an Android Language Agent under Data Scarcity
AndroidGen: Building an Android Language Agent under Data Scarcity
Hanyu Lai
Junjie Gao
Xiao-Yang Liu
Y. Xu
S. Zhang
Yuxiao Dong
Jie Tang
LLMAG
72
0
0
27 Apr 2025
ELT-Bench: An End-to-End Benchmark for Evaluating AI Agents on ELT Pipelines
ELT-Bench: An End-to-End Benchmark for Evaluating AI Agents on ELT Pipelines
Tengjun Jin
Yuxuan Zhu
Daniel Kang
LMTD
ELM
40
0
0
07 Apr 2025
An Illusion of Progress? Assessing the Current State of Web Agents
An Illusion of Progress? Assessing the Current State of Web Agents
Tianci Xue
Weijian Qi
Tianneng Shi
Chan Hee Song
Boyu Gou
D. Song
Huan Sun
Yu Su
LLMAG
ELM
79
4
1
02 Apr 2025
Plan-and-Act: Improving Planning of Agents for Long-Horizon Tasks
Plan-and-Act: Improving Planning of Agents for Long-Horizon Tasks
Lutfi Eren Erdogan
Nicholas Lee
Sehoon Kim
Suhong Moon
Hiroki Furuta
Gopala Anumanchipalli
K. K.
Amir Gholami
LLMAG
LM&Ro
AIFin
68
2
0
12 Mar 2025
Explorer: Scaling Exploration-driven Web Trajectory Synthesis for Multimodal Web Agents
Explorer: Scaling Exploration-driven Web Trajectory Synthesis for Multimodal Web Agents
Vardaan Pahuja
Yadong Lu
Corby Rosset
Boyu Gou
Arindam Mitra
Spencer Whitehead
Yu Su
Ahmed Awadallah
LLMAG
LM&Ro
Presented at ResearchTrend Connect | LLMAG on 14 Mar 2025
138
3
1
20 Feb 2025
WebRL: Training LLM Web Agents via Self-Evolving Online Curriculum Reinforcement Learning
WebRL: Training LLM Web Agents via Self-Evolving Online Curriculum Reinforcement Learning
Zehan Qi
Xiao-Chang Liu
Iat Long Iong
Hanyu Lai
X. Sun
...
Shuntian Yao
Tianjie Zhang
Wei Xu
J. Tang
Yuxiao Dong
90
14
0
28 Jan 2025
Spider 2.0: Evaluating Language Models on Real-World Enterprise Text-to-SQL Workflows
Spider 2.0: Evaluating Language Models on Real-World Enterprise Text-to-SQL Workflows
Fangyu Lei
Jixuan Chen
Yuxiao Ye
Ruisheng Cao
Dongchan Shin
...
Caiming Xiong
Ruoxi Sun
Qian Liu
Sida I. Wang
Tao Yu
LMTD
71
20
0
12 Nov 2024
Beyond Browsing: API-Based Web Agents
Beyond Browsing: API-Based Web Agents
Yueqi Song
Frank F. Xu
Shuyan Zhou
Graham Neubig
32
13
0
21 Oct 2024
SPA-Bench: A Comprehensive Benchmark for SmartPhone Agent Evaluation
SPA-Bench: A Comprehensive Benchmark for SmartPhone Agent Evaluation
Jingxuan Chen
Derek Yuen
Bin Xie
Y. Yang
Gongwei Chen
...
Liqiang Nie
Yasheng Wang
Jianye Hao
Jun Wang
Kun Shao
LLMAG
23
5
0
19 Oct 2024
Web Agents with World Models: Learning and Leveraging Environment Dynamics in Web Navigation
Web Agents with World Models: Learning and Leveraging Environment Dynamics in Web Navigation
Hyungjoo Chae
Namyoung Kim
Kai Tzu-iunn Ong
Minju Gwak
Gwanwoo Song
Jihoon Kim
S. Kim
Dongha Lee
Jinyoung Yeo
LLMAG
30
13
0
17 Oct 2024
From Grounding to Planning: Benchmarking Bottlenecks in Web Agents
From Grounding to Planning: Benchmarking Bottlenecks in Web Agents
Segev Shlomov
Ben wiesel
Aviad Sela
Ido Levy
Liane Galanti
Roy Abitbol
LLMAG
18
3
0
03 Sep 2024
AndroidWorld: A Dynamic Benchmarking Environment for Autonomous Agents
AndroidWorld: A Dynamic Benchmarking Environment for Autonomous Agents
Christopher Rawles
Sarah Clinckemaillie
Yifan Chang
Jonathan Waltz
Gabrielle Lau
...
Daniel Toyama
Robert Berry
Divya Tyamagundlu
Timothy Lillicrap
Oriana Riva
LLMAG
48
44
0
23 May 2024
OS-Copilot: Towards Generalist Computer Agents with Self-Improvement
OS-Copilot: Towards Generalist Computer Agents with Self-Improvement
Zhiyong Wu
Chengcheng Han
Zichen Ding
Zhenmin Weng
Zhoumianze Liu
Shunyu Yao
Tao Yu
Lingpeng Kong
LLMAG
LM&Ro
107
83
0
12 Feb 2024
UFO: A UI-Focused Agent for Windows OS Interaction
UFO: A UI-Focused Agent for Windows OS Interaction
Chaoyun Zhang
Liqun Li
Shilin He
Xu Zhang
Bo Qiao
...
Yu Kang
Qingwei Lin
Saravan Rajmohan
Dongmei Zhang
Qi Zhang
LLMAG
58
65
0
08 Feb 2024
CogAgent: A Visual Language Model for GUI Agents
CogAgent: A Visual Language Model for GUI Agents
Wenyi Hong
Weihan Wang
Qingsong Lv
Jiazheng Xu
Wenmeng Yu
...
Juanzi Li
Bin Xu
Yuxiao Dong
Ming Ding
Jie Tang
MLLM
132
310
0
14 Dec 2023
Enabling Conversational Interaction with Mobile UI using Large Language
  Models
Enabling Conversational Interaction with Mobile UI using Large Language Models
Bryan Wang
Gang Li
Yang Li
171
132
0
18 Sep 2022
Training language models to follow instructions with human feedback
Training language models to follow instructions with human feedback
Long Ouyang
Jeff Wu
Xu Jiang
Diogo Almeida
Carroll L. Wainwright
...
Amanda Askell
Peter Welinder
Paul Christiano
Jan Leike
Ryan J. Lowe
OSLM
ALM
301
11,730
0
04 Mar 2022
Chain-of-Thought Prompting Elicits Reasoning in Large Language Models
Chain-of-Thought Prompting Elicits Reasoning in Large Language Models
Jason W. Wei
Xuezhi Wang
Dale Schuurmans
Maarten Bosma
Brian Ichter
F. Xia
Ed H. Chi
Quoc Le
Denny Zhou
LM&Ro
LRM
AI4CE
ReLM
315
8,261
0
28 Jan 2022
1