Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2305.13455
Cited By
Clembench: Using Game Play to Evaluate Chat-Optimized Language Models as Conversational Agents
22 May 2023
Kranti Chalamalasetti
Jana Gotze
Sherzod Hakimov
Brielen Madureira
P. Sadler
David Schlangen
ELM
ALM
LLMAG
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Clembench: Using Game Play to Evaluate Chat-Optimized Language Models as Conversational Agents"
26 / 26 papers shown
Title
clem:todd: A Framework for the Systematic Benchmarking of LLM-Based Task-Oriented Dialogue System Realisations
Chalamalasetti Kranti
Sherzod Hakimov
David Schlangen
LLMAG
38
0
0
08 May 2025
TextArena
Leon Guertler
Bobby Cheng
Simon Yu
Bo Liu
Leshem Choshen
Cheston Tan
LLMAG
33
0
0
15 Apr 2025
Playpen: An Environment for Exploring Learning Through Conversational Interaction
Nicola Horst
Davide Mazzaccara
Antonia Schmidt
Michael Sullivan
Filippo Momentè
...
Alexander Koller
Oliver Lemon
David Schlangen
Mario Giulianelli
Alessandro Suglia
OffRL
32
0
0
11 Apr 2025
Ad-hoc Concept Forming in the Game Codenames as a Means for Evaluating Large Language Models
Sherzod Hakimov
Lara Pfennigschmidt
David Schlangen
ELM
53
0
0
17 Feb 2025
Beyond Outcomes: Transparent Assessment of LLM Reasoning in Games
Wenye Lin
Jonathan Roberts
Yunhan Yang
Samuel Albanie
Zongqing Lu
Kai Han
LRM
ELM
56
1
0
18 Dec 2024
Strategic Insights in Human and Large Language Model Tactics at Word Guessing Games
Matīss Rikters
Sanita Reinsone
13
0
0
17 Sep 2024
Predicting the Target Word of Game-playing Conversations using a Low-Rank Dialect Adapter for Decoder Models
Dipankar Srirag
Aditya Joshi
Jacob Eisenstein
42
1
0
31 Aug 2024
Talk Less, Interact Better: Evaluating In-context Conversational Adaptation in Multimodal LLMs
Yilun Hua
Yoav Artzi
39
3
0
02 Aug 2024
How Many Parameters Does it Take to Change a Light Bulb? Evaluating Performance in Self-Play of Conversational Games as a Function of Model Characteristics
Nidhir Bhavsar
Jonathan Jordan
Sherzod Hakimov
David Schlangen
16
0
0
20 Jun 2024
GameBench: Evaluating Strategic Reasoning Abilities of LLM Agents
Anthony Costarelli
Mat Allen
Roman Hauksson
Grace Sodunke
Suhas Hariharan
Carlson Cheng
Wenjie Li
Joshua Clymer
Arjun Yadav
ELM
ReLM
LLMAG
LRM
30
16
0
07 Jun 2024
clembench-2024: A Challenging, Dynamic, Complementary, Multilingual Benchmark and Underlying Flexible Framework for LLMs as Multi-Action Agents
Anne Beyer
Kranti Chalamalasetti
Sherzod Hakimov
Brielen Madureira
P. Sadler
David Schlangen
LLMAG
19
4
0
31 May 2024
Evaluating Dialect Robustness of Language Models via Conversation Understanding
Dipankar Srirag
Aditya Joshi
31
1
0
09 May 2024
From Persona to Personalization: A Survey on Role-Playing Language Agents
Jiangjie Chen
Xintao Wang
Rui Xu
Siyu Yuan
Yikai Zhang
...
Caiyu Hu
Siye Wu
Scott Ren
Ziquan Fu
Yanghua Xiao
50
72
0
28 Apr 2024
AgentQuest: A Modular Benchmark Framework to Measure Progress and Improve LLM Agents
Luca Gioacchini
G. Siracusano
D. Sanvito
Kiril Gashteovski
David Friede
Roberto Bifulco
Carolin (Haas) Lawrence
ELM
LLMAG
39
10
0
09 Apr 2024
Agent-Pro: Learning to Evolve via Policy-Level Reflection and Optimization
Wenqi Zhang
Ke Tang
Hai Wu
Mengna Wang
Yongliang Shen
Guiyang Hou
Zeqi Tan
Peng Li
Y. Zhuang
Weiming Lu
LLMAG
25
33
0
27 Feb 2024
GTBench: Uncovering the Strategic Reasoning Limitations of LLMs via Game-Theoretic Evaluations
Jinhao Duan
Renming Zhang
James Diffenderfer
B. Kailkhura
Lichao Sun
Elias Stengel-Eskin
Mohit Bansal
Tianlong Chen
Kaidi Xu
ELM
LRM
32
55
0
19 Feb 2024
DialogBench: Evaluating LLMs as Human-like Dialogue Systems
Jiao Ou
Junda Lu
Che Liu
Yihong Tang
Fuzheng Zhang
Di Zhang
Kun Gai
ALM
LM&MA
22
14
0
03 Nov 2023
On General Language Understanding
David Schlangen
24
1
0
27 Oct 2023
Leveraging Large Language Model for Automatic Evolving of Industrial Data-Centric R&D Cycle
Xu Yang
Xiao Yang
Weiqing Liu
Jinhui Li
Peng Yu
Zeqi Ye
Jiang Bian
13
0
0
17 Oct 2023
Put Your Money Where Your Mouth Is: Evaluating Strategic Planning and Execution of LLM Agents in an Auction Arena
Jiangjie Chen
Siyu Yuan
Rong Ye
Bodhisattwa Prasad Majumder
Kyle Richardson
LLMAG
ELM
25
54
0
09 Oct 2023
A Survey on Large Language Model based Autonomous Agents
Lei Wang
Chengbang Ma
Xueyang Feng
Zeyu Zhang
Hao-ran Yang
...
Xu Chen
Yankai Lin
Wayne Xin Zhao
Zhewei Wei
Ji-Rong Wen
LLMAG
AI4CE
LM&Ro
39
1,088
0
22 Aug 2023
RoCo: Dialectic Multi-Robot Collaboration with Large Language Models
Zhao Mandi
Shreeya Jain
Shuran Song
LM&Ro
LLMAG
23
121
0
10 Jul 2023
Generative Agents: Interactive Simulacra of Human Behavior
J. Park
Joseph C. O'Brien
Carrie J. Cai
Meredith Ringel Morris
Percy Liang
Michael S. Bernstein
LM&Ro
AI4CE
215
1,701
0
07 Apr 2023
Sparks of Artificial General Intelligence: Early experiments with GPT-4
Sébastien Bubeck
Varun Chandrasekaran
Ronen Eldan
J. Gehrke
Eric Horvitz
...
Scott M. Lundberg
Harsha Nori
Hamid Palangi
Marco Tulio Ribeiro
Yi Zhang
ELM
AI4MH
AI4CE
ALM
206
2,232
0
22 Mar 2023
Training language models to follow instructions with human feedback
Long Ouyang
Jeff Wu
Xu Jiang
Diogo Almeida
Carroll L. Wainwright
...
Amanda Askell
Peter Welinder
Paul Christiano
Jan Leike
Ryan J. Lowe
OSLM
ALM
301
11,730
0
04 Mar 2022
The slurk Interaction Server Framework: Better Data for Better Dialog Models
Jana Gotze
Maike Paetzel-Prusmann
Wencke Liermann
Tim Diekmann
David Schlangen
VLM
19
11
0
02 Feb 2022
1