Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2402.16499
Cited By
LLMArena: Assessing Capabilities of Large Language Models in Dynamic Multi-Agent Environments
26 February 2024
Junzhe Chen
Xuming Hu
Shuodi Liu
Shiyu Huang
Weijuan Tu
Zhaofeng He
Lijie Wen
ELM
LLMAG
Re-assign community
ArXiv
PDF
HTML
Papers citing
"LLMArena: Assessing Capabilities of Large Language Models in Dynamic Multi-Agent Environments"
13 / 13 papers shown
Title
VLM Q-Learning: Aligning Vision-Language Models for Interactive Decision-Making
Jake Grigsby
Yuke Zhu
Michael S Ryoo
Juan Carlos Niebles
OffRL
VLM
31
0
0
06 May 2025
A Desideratum for Conversational Agents: Capabilities, Challenges, and Future Directions
Emre Can Acikgoz
Cheng Qian
Hongru Wang
Vardhan Dongre
X. Chen
Heng Ji
Dilek Hakkani-Tür
Gökhan Tür
LM&Ro
ELM
43
1
0
07 Apr 2025
Evaluating Stability of Unreflective Alignment
James Lucassen
Mark Henry
Philippa Wright
Owen Yeung
25
0
0
27 Aug 2024
From LLMs to LLM-based Agents for Software Engineering: A Survey of Current, Challenges and Future
Haolin Jin
Linghan Huang
Haipeng Cai
Jun Yan
Bo Li
Huaming Chen
61
24
0
05 Aug 2024
BMW Agents -- A Framework For Task Automation Through Multi-Agent Collaboration
Noel Crawford
Edward B. Duffy
Iman Evazzade
Torsten Foehr
Gregory Robbins
D. K. Saha
Jiya Varma
Marcin Ziolkowski
LLMAG
38
3
0
28 Jun 2024
GameBench: Evaluating Strategic Reasoning Abilities of LLM Agents
Anthony Costarelli
Mat Allen
Roman Hauksson
Grace Sodunke
Suhas Hariharan
Carlson Cheng
Wenjie Li
Joshua Clymer
Arjun Yadav
ELM
ReLM
LLMAG
LRM
28
16
0
07 Jun 2024
LLM as a Mastermind: A Survey of Strategic Reasoning with Large Language Models
Yadong Zhang
Shaoguang Mao
Tao Ge
Xun Wang
Adrian de Wynter
Yan Xia
Wenshan Wu
Ting Song
Man Lan
Furu Wei
LRM
75
48
0
01 Apr 2024
GTBench: Uncovering the Strategic Reasoning Limitations of LLMs via Game-Theoretic Evaluations
Jinhao Duan
Renming Zhang
James Diffenderfer
B. Kailkhura
Lichao Sun
Elias Stengel-Eskin
Mohit Bansal
Tianlong Chen
Kaidi Xu
ELM
LRM
26
55
0
19 Feb 2024
Don't Make Your LLM an Evaluation Benchmark Cheater
Kun Zhou
Yutao Zhu
Zhipeng Chen
Wentong Chen
Wayne Xin Zhao
Xu Chen
Yankai Lin
Ji-Rong Wen
Jiawei Han
ELM
105
136
0
03 Nov 2023
On the Complexity of Multi-Agent Decision Making: From Learning in Games to Partial Monitoring
Dylan J. Foster
Dean Phillips Foster
Noah Golowich
Alexander Rakhlin
32
14
0
01 May 2023
Generative Agents: Interactive Simulacra of Human Behavior
J. Park
Joseph C. O'Brien
Carrie J. Cai
Meredith Ringel Morris
Percy Liang
Michael S. Bernstein
LM&Ro
AI4CE
209
1,701
0
07 Apr 2023
Large Language Models are Zero-Shot Reasoners
Takeshi Kojima
S. Gu
Machel Reid
Yutaka Matsuo
Yusuke Iwasawa
ReLM
LRM
291
2,712
0
24 May 2022
Self-Consistency Improves Chain of Thought Reasoning in Language Models
Xuezhi Wang
Jason W. Wei
Dale Schuurmans
Quoc Le
Ed H. Chi
Sharan Narang
Aakanksha Chowdhery
Denny Zhou
ReLM
BDL
LRM
AI4CE
297
3,163
0
21 Mar 2022
1