ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2308.03688
  4. Cited By
AgentBench: Evaluating LLMs as Agents

AgentBench: Evaluating LLMs as Agents

7 August 2023
Xiao Liu
Hao Yu
Hanchen Zhang
Yifan Xu
Xuanyu Lei
Hanyu Lai
Yu Gu
Hangliang Ding
Kai Men
Kejuan Yang
Shudan Zhang
Xiang Deng
Aohan Zeng
Zhengxiao Du
Chenhui Zhang
Sheng Shen
S. Shen
Yu-Chuan Su
Huan Sun
Minlie Huang
Yuxiao Dong
Jie Tang
    ELM
    LLMAG
ArXivPDFHTML

Papers citing "AgentBench: Evaluating LLMs as Agents"

6 / 56 papers shown
Title
A Survey of Text Games for Reinforcement Learning informed by Natural
  Language
A Survey of Text Games for Reinforcement Learning informed by Natural Language
P. Osborne
Heido Nomm
André Freitas
AI4CE
13
24
0
20 Sep 2021
BEHAVIOR: Benchmark for Everyday Household Activities in Virtual,
  Interactive, and Ecological Environments
BEHAVIOR: Benchmark for Everyday Household Activities in Virtual, Interactive, and Ecological Environments
S. Srivastava
Chengshu Li
Michael Lingelbach
Roberto Martín-Martín
Fei Xia
...
C. Karen Liu
Silvio Savarese
H. Gweon
Jiajun Wu
Li Fei-Fei
LM&Ro
133
152
0
06 Aug 2021
Measuring Coding Challenge Competence With APPS
Measuring Coding Challenge Competence With APPS
Dan Hendrycks
Steven Basart
Saurav Kadavath
Mantas Mazeika
Akul Arora
...
Collin Burns
Samir Puranik
Horace He
D. Song
Jacob Steinhardt
ELM
AIMat
ALM
194
614
0
20 May 2021
Zero-Shot Text-to-Image Generation
Zero-Shot Text-to-Image Generation
Aditya A. Ramesh
Mikhail Pavlov
Gabriel Goh
Scott Gray
Chelsea Voss
Alec Radford
Mark Chen
Ilya Sutskever
VLM
253
4,735
0
24 Feb 2021
The GEM Benchmark: Natural Language Generation, its Evaluation and
  Metrics
The GEM Benchmark: Natural Language Generation, its Evaluation and Metrics
Sebastian Gehrmann
Tosin P. Adewumi
Karmanya Aggarwal
Pawan Sasanka Ammanamanchi
Aremu Anuoluwapo
...
Nishant Subramani
Wei-ping Xu
Diyi Yang
Akhila Yerukola
Jiawei Zhou
VLM
238
284
0
02 Feb 2021
GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language
  Understanding
GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding
Alex Jinpeng Wang
Amanpreet Singh
Julian Michael
Felix Hill
Omer Levy
Samuel R. Bowman
ELM
294
6,927
0
20 Apr 2018
Previous
12