Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2308.10032
Cited By
GameEval: Evaluating LLMs on Conversational Games
19 August 2023
Dan Qiao
Chenfei Wu
Yaobo Liang
Juntao Li
Nan Duan
ELM
LLMAG
Re-assign community
ArXiv
PDF
HTML
Papers citing
"GameEval: Evaluating LLMs on Conversational Games"
16 / 16 papers shown
Title
clem:todd: A Framework for the Systematic Benchmarking of LLM-Based Task-Oriented Dialogue System Realisations
Chalamalasetti Kranti
Sherzod Hakimov
David Schlangen
LLMAG
38
0
0
08 May 2025
EscapeBench: Pushing Language Models to Think Outside the Box
Cheng Qian
Peixuan Han
Qinyu Luo
Bingxiang He
X. Chen
...
Jiarui Yao
Xiaocheng Yang
Denghui Zhang
Yunzhu Li
Heng Ji
LLMAG
LRM
80
3
0
18 Dec 2024
Microscopic Analysis on LLM players via Social Deduction Game
Byungjun Kim
Dayeon Seo
Bugeun Kim
29
1
0
19 Aug 2024
How Many Parameters Does it Take to Change a Light Bulb? Evaluating Performance in Self-Play of Conversational Games as a Function of Model Characteristics
Nidhir Bhavsar
Jonathan Jordan
Sherzod Hakimov
David Schlangen
16
0
0
20 Jun 2024
InterIntent: Investigating Social Intelligence of LLMs via Intention Understanding in an Interactive Game Context
Ziyi Liu
Abhishek Anand
Pei Zhou
Jen-tse Huang
Jieyu Zhao
70
4
0
18 Jun 2024
GameBench: Evaluating Strategic Reasoning Abilities of LLM Agents
Anthony Costarelli
Mat Allen
Roman Hauksson
Grace Sodunke
Suhas Hariharan
Carlson Cheng
Wenjie Li
Joshua Clymer
Arjun Yadav
ELM
ReLM
LLMAG
LRM
36
16
0
07 Jun 2024
clembench-2024: A Challenging, Dynamic, Complementary, Multilingual Benchmark and Underlying Flexible Framework for LLMs as Multi-Action Agents
Anne Beyer
Kranti Chalamalasetti
Sherzod Hakimov
Brielen Madureira
P. Sadler
David Schlangen
LLMAG
25
4
0
31 May 2024
Learning to Discuss Strategically: A Case Study on One Night Ultimate Werewolf
Xuanfa Jin
Ziyan Wang
Yali Du
Meng Fang
Haifeng Zhang
Jun Wang
OffRL
LLMAG
46
5
0
30 May 2024
A Survey on Large Language Model-Based Game Agents
Sihao Hu
Tiansheng Huang
Gaowen Liu
Ramana Rao Kompella
Gaowen Liu
Selim Furkan Tekin
Yichang Xu
Zachary Yahn
Ling Liu
LLMAG
LM&Ro
AI4CE
LM&MA
66
49
0
02 Apr 2024
LLM as a Mastermind: A Survey of Strategic Reasoning with Large Language Models
Yadong Zhang
Shaoguang Mao
Tao Ge
Xun Wang
Adrian de Wynter
Yan Xia
Wenshan Wu
Ting Song
Man Lan
Furu Wei
LRM
78
48
0
01 Apr 2024
How Far Are We on the Decision-Making of LLMs? Evaluating LLMs' Gaming Ability in Multi-Agent Environments
Jen-tse Huang
E. Li
Man Ho Lam
Tian Liang
Wenxuan Wang
Youliang Yuan
Wenxiang Jiao
Xing Wang
Zhaopeng Tu
Michael R. Lyu
ELM
LLMAG
77
32
0
18 Mar 2024
LLMArena: Assessing Capabilities of Large Language Models in Dynamic Multi-Agent Environments
Junzhe Chen
Xuming Hu
Shuodi Liu
Shiyu Huang
Weijuan Tu
Zhaofeng He
Lijie Wen
ELM
LLMAG
46
9
0
26 Feb 2024
Leveraging Word Guessing Games to Assess the Intelligence of Large Language Models
Tian Liang
Zhiwei He
Jen-tse Huang
Wenxuan Wang
Wenxiang Jiao
Rui Wang
Yujiu Yang
Zhaopeng Tu
Shuming Shi
Xing Wang
LLMAG
40
5
0
31 Oct 2023
Beyond Static Datasets: A Deep Interaction Approach to LLM Evaluation
Jiatong Li
Rui Li
Qi Liu
21
14
0
08 Sep 2023
Chain-of-Thought Prompting Elicits Reasoning in Large Language Models
Jason W. Wei
Xuezhi Wang
Dale Schuurmans
Maarten Bosma
Brian Ichter
F. Xia
Ed H. Chi
Quoc Le
Denny Zhou
LM&Ro
LRM
AI4CE
ReLM
315
8,261
0
28 Jan 2022
GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding
Alex Jinpeng Wang
Amanpreet Singh
Julian Michael
Felix Hill
Omer Levy
Samuel R. Bowman
ELM
294
6,927
0
20 Apr 2018
1