ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2310.11667
  4. Cited By
SOTOPIA: Interactive Evaluation for Social Intelligence in Language
  Agents

SOTOPIA: Interactive Evaluation for Social Intelligence in Language Agents

18 October 2023
Xuhui Zhou
Hao Zhu
Leena Mathur
Ruohong Zhang
Haofei Yu
Zhengyang Qi
Louis-Philippe Morency
Yonatan Bisk
Daniel Fried
Graham Neubig
Maarten Sap
    LLMAG
ArXivPDFHTML

Papers citing "SOTOPIA: Interactive Evaluation for Social Intelligence in Language Agents"

41 / 91 papers shown
Title
Autonomous Agents for Collaborative Task under Information Asymmetry
Autonomous Agents for Collaborative Task under Information Asymmetry
Wei Liu
Chenxi Wang
Yifei Wang
Zihao Xie
Rennai Qiu
Yufan Dang
Zhuoyun Du
Weize Chen
Cheng Yang
Chen Qian
LLMAG
36
4
0
21 Jun 2024
How Many Parameters Does it Take to Change a Light Bulb? Evaluating
  Performance in Self-Play of Conversational Games as a Function of Model
  Characteristics
How Many Parameters Does it Take to Change a Light Bulb? Evaluating Performance in Self-Play of Conversational Games as a Function of Model Characteristics
Nidhir Bhavsar
Jonathan Jordan
Sherzod Hakimov
David Schlangen
21
0
0
20 Jun 2024
InterIntent: Investigating Social Intelligence of LLMs via Intention
  Understanding in an Interactive Game Context
InterIntent: Investigating Social Intelligence of LLMs via Intention Understanding in an Interactive Game Context
Ziyi Liu
Abhishek Anand
Pei Zhou
Jen-tse Huang
Jieyu Zhao
70
5
0
18 Jun 2024
Dialogue Action Tokens: Steering Language Models in Goal-Directed
  Dialogue with a Multi-Turn Planner
Dialogue Action Tokens: Steering Language Models in Goal-Directed Dialogue with a Multi-Turn Planner
Kenneth Li
Yiming Wang
Fernanda Viégas
Martin Wattenberg
30
6
0
17 Jun 2024
The BiGGen Bench: A Principled Benchmark for Fine-grained Evaluation of Language Models with Language Models
The BiGGen Bench: A Principled Benchmark for Fine-grained Evaluation of Language Models with Language Models
Seungone Kim
Juyoung Suk
Ji Yong Cho
Shayne Longpre
Chaeeun Kim
...
Sean Welleck
Graham Neubig
Moontae Lee
Kyungjae Lee
Minjoon Seo
ELM
ALM
LM&MA
97
29
0
09 Jun 2024
AgentGym: Evolving Large Language Model-based Agents across Diverse
  Environments
AgentGym: Evolving Large Language Model-based Agents across Diverse Environments
Zhiheng Xi
Yiwen Ding
Wenxiang Chen
Boyang Hong
Honglin Guo
...
Qi Zhang
Xipeng Qiu
Xuanjing Huang
Zuxuan Wu
Yu-Gang Jiang
LLMAG
LM&Ro
38
29
0
06 Jun 2024
clembench-2024: A Challenging, Dynamic, Complementary, Multilingual
  Benchmark and Underlying Flexible Framework for LLMs as Multi-Action Agents
clembench-2024: A Challenging, Dynamic, Complementary, Multilingual Benchmark and Underlying Flexible Framework for LLMs as Multi-Action Agents
Anne Beyer
Kranti Chalamalasetti
Sherzod Hakimov
Brielen Madureira
P. Sadler
David Schlangen
LLMAG
30
4
0
31 May 2024
Generative Students: Using LLM-Simulated Student Profiles to Support
  Question Item Evaluation
Generative Students: Using LLM-Simulated Student Profiles to Support Question Item Evaluation
Xinyi Lu
Xu Wang
AI4Ed
19
23
0
19 May 2024
Towards Generalizable Agents in Text-Based Educational Environments: A
  Study of Integrating RL with LLMs
Towards Generalizable Agents in Text-Based Educational Environments: A Study of Integrating RL with LLMs
Bahar Radmehr
Adish Singla
Tanja Kaser
LLMAG
AI4CE
32
5
0
29 Apr 2024
From Persona to Personalization: A Survey on Role-Playing Language
  Agents
From Persona to Personalization: A Survey on Role-Playing Language Agents
Jiangjie Chen
Xintao Wang
Rui Xu
Siyu Yuan
Yikai Zhang
...
Caiyu Hu
Siye Wu
Scott Ren
Ziquan Fu
Yanghua Xiao
50
76
0
28 Apr 2024
A Survey on Self-Evolution of Large Language Models
A Survey on Self-Evolution of Large Language Models
Zhengwei Tao
Ting-En Lin
Xiancai Chen
Hangyu Li
Yuchuan Wu
Yongbin Li
Zhi Jin
Fei Huang
Dacheng Tao
Jingren Zhou
LRM
LM&Ro
49
21
0
22 Apr 2024
Direct Preference Optimization of Video Large Multimodal Models from
  Language Model Reward
Direct Preference Optimization of Video Large Multimodal Models from Language Model Reward
Ruohong Zhang
Liangke Gui
Zhiqing Sun
Yihao Feng
Keyang Xu
...
Di Fu
Chunyuan Li
Alexander G. Hauptmann
Yonatan Bisk
Yiming Yang
MLLM
43
57
0
01 Apr 2024
Academically intelligent LLMs are not necessarily socially intelligent
Academically intelligent LLMs are not necessarily socially intelligent
Ruoxi Xu
Hongyu Lin
Xianpei Han
Le Sun
Yingfei Sun
ELM
29
6
0
11 Mar 2024
Social Intelligence Data Infrastructure: Structuring the Present and
  Navigating the Future
Social Intelligence Data Infrastructure: Structuring the Present and Navigating the Future
Minzhi Li
Weiyan Shi
Caleb Ziems
Diyi Yang
33
8
0
28 Feb 2024
Unveiling the Truth and Facilitating Change: Towards Agent-based
  Large-scale Social Movement Simulation
Unveiling the Truth and Facilitating Change: Towards Agent-based Large-scale Social Movement Simulation
Xinyi Mou
Zhongyu Wei
Xuanjing Huang
LLMAG
21
29
0
26 Feb 2024
Q-Probe: A Lightweight Approach to Reward Maximization for Language
  Models
Q-Probe: A Lightweight Approach to Reward Maximization for Language Models
Kenneth Li
Samy Jelassi
Hugh Zhang
Sham Kakade
Martin Wattenberg
David Brandfonbrener
27
9
0
22 Feb 2024
Data-driven Discovery with Large Generative Models
Data-driven Discovery with Large Generative Models
Bodhisattwa Prasad Majumder
Harshit Surana
Dhruv Agarwal
Sanchaita Hazra
Ashish Sabharwal
Peter Clark
35
9
0
21 Feb 2024
IMBUE: Improving Interpersonal Effectiveness through Simulation and
  Just-in-time Feedback with Human-Language Model Interaction
IMBUE: Improving Interpersonal Effectiveness through Simulation and Just-in-time Feedback with Human-Language Model Interaction
Inna Wanyin Lin
Ashish Sharma
Christopher Rytting
Adam S. Miner
Jina Suh
Tim Althoff
27
11
0
19 Feb 2024
EmoBench: Evaluating the Emotional Intelligence of Large Language Models
EmoBench: Evaluating the Emotional Intelligence of Large Language Models
Sahand Sabour
Siyang Liu
Zheyuan Zhang
June M. Liu
Jinfeng Zhou
Alvionna S. Sunaryo
Juanzi Li
Tatia M.C. Lee
Rada Mihalcea
Minlie Huang
27
11
0
19 Feb 2024
Network Formation and Dynamics Among Multi-LLMs
Network Formation and Dynamics Among Multi-LLMs
Marios Papachristou
Yuan Yuan
36
11
0
16 Feb 2024
Symmetry-Breaking Augmentations for Ad Hoc Teamwork
Symmetry-Breaking Augmentations for Ad Hoc Teamwork
Ravi Hammond
Dustin Craggs
Mingyu Guo
Jakob Foerster
Ian Reid
23
1
0
15 Feb 2024
OpenToM: A Comprehensive Benchmark for Evaluating Theory-of-Mind
  Reasoning Capabilities of Large Language Models
OpenToM: A Comprehensive Benchmark for Evaluating Theory-of-Mind Reasoning Capabilities of Large Language Models
Hainiu Xu
Runcong Zhao
Lixing Zhu
Jinhua Du
Yulan He
68
18
0
08 Feb 2024
How Well Can LLMs Negotiate? NegotiationArena Platform and Analysis
How Well Can LLMs Negotiate? NegotiationArena Platform and Analysis
Federico Bianchi
P. Chia
Mert Yuksekgonul
Jacopo Tagliabue
Daniel Jurafsky
James Y. Zou
LLMAG
27
30
0
08 Feb 2024
TimeArena: Shaping Efficient Multitasking Language Agents in a
  Time-Aware Simulation
TimeArena: Shaping Efficient Multitasking Language Agents in a Time-Aware Simulation
Yikai Zhang
Siyu Yuan
Caiyu Hu
Kyle Richardson
Yanghua Xiao
Jiangjie Chen
AI4CE
LLMAG
27
13
0
08 Feb 2024
Self-Alignment of Large Language Models via Monopolylogue-based Social
  Scene Simulation
Self-Alignment of Large Language Models via Monopolylogue-based Social Scene Simulation
Xianghe Pang
Shuo Tang
Rui Ye
Yuxin Xiong
Bolun Zhang
Yanfeng Wang
Siheng Chen
114
28
0
08 Feb 2024
Can Large Language Model Agents Simulate Human Trust Behaviors?
Can Large Language Model Agents Simulate Human Trust Behaviors?
Chengxing Xie
Canyu Chen
Feiran Jia
Ziyu Ye
Kai Shu
Adel Bibi
Ziniu Hu
Philip H. S. Torr
Bernard Ghanem
G. Li
LM&Ro
LLMAG
74
53
0
07 Feb 2024
Large Language Model based Multi-Agents: A Survey of Progress and
  Challenges
Large Language Model based Multi-Agents: A Survey of Progress and Challenges
Taicheng Guo
Xiuying Chen
Yaqi Wang
Ruidi Chang
Shichao Pei
Nitesh V. Chawla
Olaf Wiest
Xiangliang Zhang
LLMAG
LM&Ro
AI4CE
LRM
31
246
0
21 Jan 2024
Large Language Models Empowered Agent-based Modeling and Simulation: A
  Survey and Perspectives
Large Language Models Empowered Agent-based Modeling and Simulation: A Survey and Perspectives
Chen Gao
Xiaochong Lan
Nian Li
Yuan Yuan
Jingtao Ding
Zhilun Zhou
Fengli Xu
Yong Li
LLMAG
AI4CE
LM&Ro
29
99
0
19 Dec 2023
Urban Generative Intelligence (UGI): A Foundational Platform for Agents
  in Embodied City Environment
Urban Generative Intelligence (UGI): A Foundational Platform for Agents in Embodied City Environment
Fengli Xu
Jun Zhang
Chen Gao
J. Feng
Yong Li
AI4CE
LLMAG
19
28
0
19 Dec 2023
Generative agent-based modeling with actions grounded in physical,
  social, or digital space using Concordia
Generative agent-based modeling with actions grounded in physical, social, or digital space using Concordia
A. Vezhnevets
J. Agapiou
Avia Aharon
Ron Ziv
Jayd Matyas
Edgar A. Duénez-Guzmán
William A. Cunningham
Simon Osindero
Danny Karmon
Joel Z. Leibo
LLMAG
LM&Ro
AI4CE
30
40
0
06 Dec 2023
Negotiating with LLMS: Prompt Hacks, Skill Gaps, and Reasoning Deficits
Negotiating with LLMS: Prompt Hacks, Skill Gaps, and Reasoning Deficits
Johannes Schneider
Steffi Haag
Leona Chandra Kruse
10
14
0
26 Nov 2023
Simulating Opinion Dynamics with Networks of LLM-based Agents
Simulating Opinion Dynamics with Networks of LLM-based Agents
Yun-Shiuan Chuang
Agam Goyal
Nikunj Harlalka
Siddharth Suresh
Robert Hawkins
Sijia Yang
Dhavan Shah
Junjie Hu
Timothy T. Rogers
AI4CE
19
53
0
16 Nov 2023
Can LLMs Keep a Secret? Testing Privacy Implications of Language Models
  via Contextual Integrity Theory
Can LLMs Keep a Secret? Testing Privacy Implications of Language Models via Contextual Integrity Theory
Niloofar Mireshghallah
Hyunwoo J. Kim
Xuhui Zhou
Yulia Tsvetkov
Maarten Sap
Reza Shokri
Yejin Choi
PILM
22
73
0
27 Oct 2023
Affective Visual Dialog: A Large-Scale Benchmark for Emotional Reasoning
  Based on Visually Grounded Conversations
Affective Visual Dialog: A Large-Scale Benchmark for Emotional Reasoning Based on Visually Grounded Conversations
Kilichbek Haydarov
Xiaoqian Shen
Avinash Madasu
Mahmoud Salem
Jia Li
Gamaleldin F. Elsayed
Mohamed Elhoseiny
28
4
0
30 Aug 2023
PersonaLLM: Investigating the Ability of Large Language Models to
  Express Personality Traits
PersonaLLM: Investigating the Ability of Large Language Models to Express Personality Traits
Hang Jiang
Xiajie Zhang
Xubo Cao
Cynthia Breazeal
Deb Roy
Jad Kabbara
49
73
0
04 May 2023
Generative Agents: Interactive Simulacra of Human Behavior
Generative Agents: Interactive Simulacra of Human Behavior
J. Park
Joseph C. O'Brien
Carrie J. Cai
Meredith Ringel Morris
Percy Liang
Michael S. Bernstein
LM&Ro
AI4CE
215
1,727
0
07 Apr 2023
Grounding Language Models to Images for Multimodal Inputs and Outputs
Grounding Language Models to Images for Multimodal Inputs and Outputs
Jing Yu Koh
Ruslan Salakhutdinov
Daniel Fried
MLLM
23
117
0
31 Jan 2023
ReAct: Synergizing Reasoning and Acting in Language Models
ReAct: Synergizing Reasoning and Acting in Language Models
Shunyu Yao
Jeffrey Zhao
Dian Yu
Nan Du
Izhak Shafran
Karthik Narasimhan
Yuan Cao
LLMAG
ReLM
LRM
233
2,470
0
06 Oct 2022
Chain-of-Thought Prompting Elicits Reasoning in Large Language Models
Chain-of-Thought Prompting Elicits Reasoning in Large Language Models
Jason W. Wei
Xuezhi Wang
Dale Schuurmans
Maarten Bosma
Brian Ichter
F. Xia
Ed H. Chi
Quoc Le
Denny Zhou
LM&Ro
LRM
AI4CE
ReLM
315
8,402
0
28 Jan 2022
TEACh: Task-driven Embodied Agents that Chat
TEACh: Task-driven Embodied Agents that Chat
Aishwarya Padmakumar
Jesse Thomason
Ayush Shrivastava
P. Lange
Anjali Narayan-Chen
Spandana Gella
Robinson Piramithu
Gökhan Tür
Dilek Z. Hakkani-Tür
LM&Ro
155
180
0
01 Oct 2021
"Other-Play" for Zero-Shot Coordination
"Other-Play" for Zero-Shot Coordination
Hengyuan Hu
Adam Lerer
A. Peysakhovich
Jakob N. Foerster
VLM
OffRL
133
215
0
06 Mar 2020
Previous
12