ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2310.11667
  4. Cited By
SOTOPIA: Interactive Evaluation for Social Intelligence in Language
  Agents

SOTOPIA: Interactive Evaluation for Social Intelligence in Language Agents

18 October 2023
Xuhui Zhou
Hao Zhu
Leena Mathur
Ruohong Zhang
Haofei Yu
Zhengyang Qi
Louis-Philippe Morency
Yonatan Bisk
Daniel Fried
Graham Neubig
Maarten Sap
    LLMAG
ArXivPDFHTML

Papers citing "SOTOPIA: Interactive Evaluation for Social Intelligence in Language Agents"

24 / 24 papers shown
Title
A Survey on Large Language Model based Human-Agent Systems
A Survey on Large Language Model based Human-Agent Systems
Henry Peng Zou
Wei-Chieh Huang
Yaozu Wu
Yankai Chen
Chunyu Miao
...
Y. Li
Yuwei Cao
Dongyuan Li
Renhe Jiang
Philip S. Yu
LLMAG
LM&Ro
LM&MA
79
0
0
01 May 2025
Sentient Agent as a Judge: Evaluating Higher-Order Social Cognition in Large Language Models
Sentient Agent as a Judge: Evaluating Higher-Order Social Cognition in Large Language Models
Bang Zhang
Ruotian Ma
Qingxuan Jiang
Peisong Wang
Jiaqi Chen
...
Fanghua Ye
Jian Li
Yifan Yang
Zhaopeng Tu
Xiaolong Li
LLMAG
ELM
ALM
100
25
1
01 May 2025
Auto-SLURP: A Benchmark Dataset for Evaluating Multi-Agent Frameworks in Smart Personal Assistant
Auto-SLURP: A Benchmark Dataset for Evaluating Multi-Agent Frameworks in Smart Personal Assistant
Lei Shen
Xiaoyu Shen
56
0
0
25 Apr 2025
EmoAgent: Assessing and Safeguarding Human-AI Interaction for Mental Health Safety
EmoAgent: Assessing and Safeguarding Human-AI Interaction for Mental Health Safety
Jiahao Qiu
Yinghui He
Xinzhe Juan
Y. Wang
Y. Liu
Zixin Yao
Yue Wu
Xun Jiang
L. Yang
Mengdi Wang
AI4MH
68
0
0
13 Apr 2025
How Social is It? A Benchmark for LLMs' Capabilities in Multi-user Multi-turn Social Agent Tasks
How Social is It? A Benchmark for LLMs' Capabilities in Multi-user Multi-turn Social Agent Tasks
Yusen Wu
Junwu Xiong
Xiaotie Deng
LLMAG
36
0
0
04 Apr 2025
Exploring and Controlling Diversity in LLM-Agent Conversation
Exploring and Controlling Diversity in LLM-Agent Conversation
Kuanchao Chu
Yi-Pei Chen
Hideki Nakayama
LLMAG
42
1
0
24 Feb 2025
InfoQuest: Evaluating Multi-Turn Dialogue Agents for Open-Ended Conversations with Hidden Context
InfoQuest: Evaluating Multi-Turn Dialogue Agents for Open-Ended Conversations with Hidden Context
Bryan L. M. de Oliveira
Luana G. B. Martins
Bruno Brandão
L. Melo
ELM
116
1
0
17 Feb 2025
Large Language Models can Achieve Social Balance
Large Language Models can Achieve Social Balance
Pedro Cisneros-Velarde
37
1
0
05 Oct 2024
AI-LieDar: Examine the Trade-off Between Utility and Truthfulness in LLM Agents
AI-LieDar: Examine the Trade-off Between Utility and Truthfulness in LLM Agents
Zhe Su
Xuhui Zhou
Sanketh Rangreji
Anubha Kabra
Julia Mendelsohn
Faeze Brahman
Maarten Sap
LLMAG
95
2
0
13 Sep 2024
LLMs generate structurally realistic social networks but overestimate political homophily
LLMs generate structurally realistic social networks but overestimate political homophily
Serina Chang
Alicja Chaszczewicz
Emma Wang
Maya Josifovska
Emma Pierson
J. Leskovec
40
6
0
29 Aug 2024
MMRole: A Comprehensive Framework for Developing and Evaluating Multimodal Role-Playing Agents
MMRole: A Comprehensive Framework for Developing and Evaluating Multimodal Role-Playing Agents
Yanqi Dai
Huanran Hu
Lei Wang
Shengjie Jin
X. Chen
Zhiwu Lu
LLMAG
56
7
0
08 Aug 2024
PersLLM: A Personified Training Approach for Large Language Models
PersLLM: A Personified Training Approach for Large Language Models
Zheni Zeng
Jiayi Chen
H. Chen
Yukun Yan
Yuxuan Chen
Zhenghao Liu
Zhiyuan Liu
Maosong Sun
LLMAG
37
2
0
17 Jul 2024
BigCodeBench: Benchmarking Code Generation with Diverse Function Calls and Complex Instructions
BigCodeBench: Benchmarking Code Generation with Diverse Function Calls and Complex Instructions
Terry Yue Zhuo
Minh Chien Vu
Jenny Chim
Han Hu
Wenhao Yu
...
David Lo
Daniel Fried
Xiaoning Du
H. D. Vries
Leandro von Werra
65
128
0
22 Jun 2024
The BiGGen Bench: A Principled Benchmark for Fine-grained Evaluation of Language Models with Language Models
The BiGGen Bench: A Principled Benchmark for Fine-grained Evaluation of Language Models with Language Models
Seungone Kim
Juyoung Suk
Ji Yong Cho
Shayne Longpre
Chaeeun Kim
...
Sean Welleck
Graham Neubig
Moontae Lee
Kyungjae Lee
Minjoon Seo
ELM
ALM
LM&MA
97
29
0
09 Jun 2024
A Survey on Self-Evolution of Large Language Models
A Survey on Self-Evolution of Large Language Models
Zhengwei Tao
Ting-En Lin
Xiancai Chen
Hangyu Li
Yuchuan Wu
Yongbin Li
Zhi Jin
Fei Huang
Dacheng Tao
Jingren Zhou
LRM
LM&Ro
49
21
0
22 Apr 2024
Direct Preference Optimization of Video Large Multimodal Models from
  Language Model Reward
Direct Preference Optimization of Video Large Multimodal Models from Language Model Reward
Ruohong Zhang
Liangke Gui
Zhiqing Sun
Yihao Feng
Keyang Xu
...
Di Fu
Chunyuan Li
Alexander G. Hauptmann
Yonatan Bisk
Yiming Yang
MLLM
43
57
0
01 Apr 2024
Symmetry-Breaking Augmentations for Ad Hoc Teamwork
Symmetry-Breaking Augmentations for Ad Hoc Teamwork
Ravi Hammond
Dustin Craggs
Mingyu Guo
Jakob Foerster
Ian Reid
23
1
0
15 Feb 2024
Can Large Language Model Agents Simulate Human Trust Behaviors?
Can Large Language Model Agents Simulate Human Trust Behaviors?
Chengxing Xie
Canyu Chen
Feiran Jia
Ziyu Ye
Kai Shu
Adel Bibi
Ziniu Hu
Philip H. S. Torr
Bernard Ghanem
G. Li
LM&Ro
LLMAG
74
53
0
07 Feb 2024
PersonaLLM: Investigating the Ability of Large Language Models to
  Express Personality Traits
PersonaLLM: Investigating the Ability of Large Language Models to Express Personality Traits
Hang Jiang
Xiajie Zhang
Xubo Cao
Cynthia Breazeal
Deb Roy
Jad Kabbara
49
73
0
04 May 2023
Generative Agents: Interactive Simulacra of Human Behavior
Generative Agents: Interactive Simulacra of Human Behavior
J. Park
Joseph C. O'Brien
Carrie J. Cai
Meredith Ringel Morris
Percy Liang
Michael S. Bernstein
LM&Ro
AI4CE
215
1,727
0
07 Apr 2023
ReAct: Synergizing Reasoning and Acting in Language Models
ReAct: Synergizing Reasoning and Acting in Language Models
Shunyu Yao
Jeffrey Zhao
Dian Yu
Nan Du
Izhak Shafran
Karthik Narasimhan
Yuan Cao
LLMAG
ReLM
LRM
233
2,470
0
06 Oct 2022
Chain-of-Thought Prompting Elicits Reasoning in Large Language Models
Chain-of-Thought Prompting Elicits Reasoning in Large Language Models
Jason W. Wei
Xuezhi Wang
Dale Schuurmans
Maarten Bosma
Brian Ichter
F. Xia
Ed H. Chi
Quoc Le
Denny Zhou
LM&Ro
LRM
AI4CE
ReLM
315
8,402
0
28 Jan 2022
TEACh: Task-driven Embodied Agents that Chat
TEACh: Task-driven Embodied Agents that Chat
Aishwarya Padmakumar
Jesse Thomason
Ayush Shrivastava
P. Lange
Anjali Narayan-Chen
Spandana Gella
Robinson Piramithu
Gökhan Tür
Dilek Z. Hakkani-Tür
LM&Ro
152
180
0
01 Oct 2021
"Other-Play" for Zero-Shot Coordination
"Other-Play" for Zero-Shot Coordination
Hengyuan Hu
Adam Lerer
A. Peysakhovich
Jakob N. Foerster
VLM
OffRL
133
215
0
06 Mar 2020
1