ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2310.11667
  4. Cited By
SOTOPIA: Interactive Evaluation for Social Intelligence in Language
  Agents

SOTOPIA: Interactive Evaluation for Social Intelligence in Language Agents

18 October 2023
Xuhui Zhou
Hao Zhu
Leena Mathur
Ruohong Zhang
Haofei Yu
Zhengyang Qi
Louis-Philippe Morency
Yonatan Bisk
Daniel Fried
Graham Neubig
Maarten Sap
    LLMAG
ArXivPDFHTML

Papers citing "SOTOPIA: Interactive Evaluation for Social Intelligence in Language Agents"

50 / 91 papers shown
Title
Automated Meta Prompt Engineering for Alignment with the Theory of Mind
Automated Meta Prompt Engineering for Alignment with the Theory of Mind
Aaron Baughman
Rahul Agarwal
Eduardo Morales
Gozde Akay
7
0
0
13 May 2025
Sentient Agent as a Judge: Evaluating Higher-Order Social Cognition in Large Language Models
Sentient Agent as a Judge: Evaluating Higher-Order Social Cognition in Large Language Models
Bang Zhang
Ruotian Ma
Qingxuan Jiang
Peisong Wang
Jiaqi Chen
...
Fanghua Ye
Jian Li
Yifan Yang
Zhaopeng Tu
Xiaolong Li
LLMAG
ELM
ALM
102
25
1
01 May 2025
A Survey on Large Language Model based Human-Agent Systems
A Survey on Large Language Model based Human-Agent Systems
Henry Peng Zou
Wei-Chieh Huang
Yaozu Wu
Yankai Chen
Chunyu Miao
...
Y. Li
Yuwei Cao
Dongyuan Li
Renhe Jiang
Philip S. Yu
LLMAG
LM&Ro
LM&MA
79
0
0
01 May 2025
MF-LLM: Simulating Collective Decision Dynamics via a Mean-Field Large Language Model Framework
MF-LLM: Simulating Collective Decision Dynamics via a Mean-Field Large Language Model Framework
Qirui Mi
Mengyue Yang
Xiangning Yu
Zhiyu Zhao
Cheng Deng
Bo An
H. Zhang
Xu Chen
J. Wang
36
0
0
30 Apr 2025
Auto-SLURP: A Benchmark Dataset for Evaluating Multi-Agent Frameworks in Smart Personal Assistant
Auto-SLURP: A Benchmark Dataset for Evaluating Multi-Agent Frameworks in Smart Personal Assistant
Lei Shen
Xiaoyu Shen
56
0
0
25 Apr 2025
Rethinking Theory of Mind Benchmarks for LLMs: Towards A User-Centered Perspective
Rethinking Theory of Mind Benchmarks for LLMs: Towards A User-Centered Perspective
Qiaosi Wang
Xuhui Zhou
Maarten Sap
Jodi Forlizzi
Hong Shen
33
0
0
15 Apr 2025
EmoAgent: Assessing and Safeguarding Human-AI Interaction for Mental Health Safety
EmoAgent: Assessing and Safeguarding Human-AI Interaction for Mental Health Safety
Jiahao Qiu
Yinghui He
Xinzhe Juan
Y. Wang
Y. Liu
Zixin Yao
Yue Wu
Xun Jiang
L. Yang
Mengdi Wang
AI4MH
68
0
0
13 Apr 2025
Playpen: An Environment for Exploring Learning Through Conversational Interaction
Playpen: An Environment for Exploring Learning Through Conversational Interaction
Nicola Horst
Davide Mazzaccara
Antonia Schmidt
Michael Sullivan
Filippo Momentè
...
Alexander Koller
Oliver Lemon
David Schlangen
Mario Giulianelli
Alessandro Suglia
OffRL
32
0
0
11 Apr 2025
Can LLMs Simulate Personas with Reversed Performance? A Benchmark for Counterfactual Instruction Following
Can LLMs Simulate Personas with Reversed Performance? A Benchmark for Counterfactual Instruction Following
Sai Adith Senthil Kumar
Hao Yan
Saipavan Perepa
Murong Yue
Ziyu Yao
57
0
0
08 Apr 2025
Verification of Autonomous Neural Car Control with KeYmaera X
Verification of Autonomous Neural Car Control with KeYmaera X
Enguerrand Prebet
Samuel Teuber
André Platzer
31
0
0
04 Apr 2025
How Social is It? A Benchmark for LLMs' Capabilities in Multi-user Multi-turn Social Agent Tasks
How Social is It? A Benchmark for LLMs' Capabilities in Multi-user Multi-turn Social Agent Tasks
Yusen Wu
Junwu Xiong
Xiaotie Deng
LLMAG
36
0
0
04 Apr 2025
Cultural Learning-Based Culture Adaptation of Language Models
Cultural Learning-Based Culture Adaptation of Language Models
Chen Cecilia Liu
Anna Korhonen
Iryna Gurevych
39
0
0
03 Apr 2025
If an LLM Were a Character, Would It Know Its Own Story? Evaluating Lifelong Learning in LLMs
If an LLM Were a Character, Would It Know Its Own Story? Evaluating Lifelong Learning in LLMs
Siqi Fan
Xiusheng Huang
Yiqun Yao
Xuezhi Fang
Kang Liu
Peng Han
Shuo Shang
Aixin Sun
Yequan Wang
LLMAG
40
0
0
30 Mar 2025
Leveraging LLMs with Iterative Loop Structure for Enhanced Social Intelligence in Video Question Answering
Leveraging LLMs with Iterative Loop Structure for Enhanced Social Intelligence in Video Question Answering
Erika Mori
Yue Qiu
Hirokatsu Kataoka
Y. Aoki
51
0
0
27 Mar 2025
EmpathyAgent: Can Embodied Agents Conduct Empathetic Actions?
EmpathyAgent: Can Embodied Agents Conduct Empathetic Actions?
Xinyan Chen
Jiaxin Ge
Hongming Dai
Qiang Zhou
Qiuxuan Feng
Jingtong Hu
Y. Wang
Jiaming Liu
Shanghang Zhang
LM&Ro
65
0
0
19 Mar 2025
Guess What I am Thinking: A Benchmark for Inner Thought Reasoning of Role-Playing Language Agents
R. Xu
Mingyu Wang
Xintao Wang
Dakuan Lu
Xiaoyu Tan
Wei Chu
Yinghui Xu
LRM
LLMAG
61
0
0
11 Mar 2025
Persuasion at Play: Understanding Misinformation Dynamics in Demographic-Aware Human-LLM Interactions
Angana Borah
Rada Mihalcea
Verónica Pérez-Rosas
50
1
0
03 Mar 2025
Mind the (Belief) Gap: Group Identity in the World of LLMs
Angana Borah
Marwa Houalla
Rada Mihalcea
32
0
0
03 Mar 2025
EgoNormia: Benchmarking Physical Social Norm Understanding
EgoNormia: Benchmarking Physical Social Norm Understanding
MohammadHossein Rezaei
Yicheng Fu
Phil Cuvin
Caleb Ziems
Y. Zhang
Hao Zhu
Diyi Yang
EgoV
48
1
0
27 Feb 2025
Exploring and Controlling Diversity in LLM-Agent Conversation
Exploring and Controlling Diversity in LLM-Agent Conversation
Kuanchao Chu
Yi-Pei Chen
Hideki Nakayama
LLMAG
42
1
0
24 Feb 2025
Grounded Persuasive Language Generation for Automated Marketing
Grounded Persuasive Language Generation for Automated Marketing
Jibang Wu
Chenghao Yang
Simon Mahns
Chaoqi Wang
Hao Zhu
Fei Fang
Haifeng Xu
38
1
0
24 Feb 2025
InfoQuest: Evaluating Multi-Turn Dialogue Agents for Open-Ended Conversations with Hidden Context
InfoQuest: Evaluating Multi-Turn Dialogue Agents for Open-Ended Conversations with Hidden Context
Bryan L. M. de Oliveira
Luana G. B. Martins
Bruno Brandão
L. Melo
ELM
119
1
0
17 Feb 2025
AgentSociety: Large-Scale Simulation of LLM-Driven Generative Agents Advances Understanding of Human Behaviors and Society
AgentSociety: Large-Scale Simulation of LLM-Driven Generative Agents Advances Understanding of Human Behaviors and Society
J. Piao
Yuwei Yan
Jun Zhang
Nian Li
Junbo Yan
...
Fengli Xu
Fang Zhang
Ke Rong
Jun Su
Y. Li
AI4CE
73
8
0
12 Feb 2025
HamRaz: A Culture-Based Persian Conversation Dataset for Person-Centered Therapy Using LLM Agents
HamRaz: A Culture-Based Persian Conversation Dataset for Person-Centered Therapy Using LLM Agents
Mohammad Amin Abbasi
Farnaz Sadat Mirnezami
Hassan Naderi
42
1
0
09 Feb 2025
TheAgentCompany: Benchmarking LLM Agents on Consequential Real World
  Tasks
TheAgentCompany: Benchmarking LLM Agents on Consequential Real World Tasks
Frank F. Xu
Yufan Song
Boxuan Li
Yuxuan Tang
Kritanjali Jain
...
Wayne Chi
Lawrence Jang
Yiqing Xie
Shuyan Zhou
Graham Neubig
LLMAG
124
21
0
18 Dec 2024
Assessing the Impact of Conspiracy Theories Using Large Language Models
Assessing the Impact of Conspiracy Theories Using Large Language Models
Bohan Jiang
Dawei Li
Zhen Tan
Xinyi Zhou
Ashwin Rao
Kristina Lerman
H. Bernard
Huan Liu
82
2
0
09 Dec 2024
From Individual to Society: A Survey on Social Simulation Driven by
  Large Language Model-based Agents
From Individual to Society: A Survey on Social Simulation Driven by Large Language Model-based Agents
Xinyi Mou
Xuanwen Ding
Qi He
Liang Wang
Jingcong Liang
...
L. Sun
Jiayu Lin
Jie Zhou
Xuanjing Huang
Zhongyu Wei
LLMAG
LM&Ro
AI4CE
80
12
0
04 Dec 2024
Towards Full Delegation: Designing Ideal Agentic Behaviors for Travel
  Planning
Towards Full Delegation: Designing Ideal Agentic Behaviors for Travel Planning
Song Jiang
Da JU
Andrew Cohen
Sasha Mitts
Aaron Foss
Justine T Kao
Xian Li
Yuandong Tian
62
2
0
21 Nov 2024
Minion: A Technology Probe for Resolving Value Conflicts through
  Expert-Driven and User-Driven Strategies in AI Companion Applications
Minion: A Technology Probe for Resolving Value Conflicts through Expert-Driven and User-Driven Strategies in AI Companion Applications
Xianzhe Fan
Qing Xiao
Xuhui Zhou
Yuran Su
Zhicong Lu
Maarten Sap
Hong Shen
28
0
0
11 Nov 2024
MorphAgent: Empowering Agents through Self-Evolving Profiles and
  Decentralized Collaboration
MorphAgent: Empowering Agents through Self-Evolving Profiles and Decentralized Collaboration
Siyuan Lu
Jiaqi Shao
B. Luo
Tao Lin
LM&Ro
LLMAG
AI4CE
29
2
0
19 Oct 2024
Prompt Infection: LLM-to-LLM Prompt Injection within Multi-Agent Systems
Prompt Infection: LLM-to-LLM Prompt Injection within Multi-Agent Systems
Donghyun Lee
Mo Tiwari
LLMAG
26
9
0
09 Oct 2024
Intriguing Properties of Large Language and Vision Models
Intriguing Properties of Large Language and Vision Models
Young-Jun Lee
ByungSoo Ko
Han-Gyu Kim
Yechan Hwang
Ho-Jin Choi
LRM
VLM
43
0
0
07 Oct 2024
Large Language Models can Achieve Social Balance
Large Language Models can Achieve Social Balance
Pedro Cisneros-Velarde
37
1
0
05 Oct 2024
Towards Implicit Bias Detection and Mitigation in Multi-Agent LLM
  Interactions
Towards Implicit Bias Detection and Mitigation in Multi-Agent LLM Interactions
Angana Borah
Rada Mihalcea
30
7
0
03 Oct 2024
'Simulacrum of Stories': Examining Large Language Models as Qualitative
  Research Participants
'Simulacrum of Stories': Examining Large Language Models as Qualitative Research Participants
Shivani Kapania
William Agnew
Motahhare Eslami
Hoda Heidari
Sarah E Fox
34
4
0
28 Sep 2024
The Imperative of Conversation Analysis in the Era of LLMs: A Survey of
  Tasks, Techniques, and Trends
The Imperative of Conversation Analysis in the Era of LLMs: A Survey of Tasks, Techniques, and Trends
Xinghua Zhang
Haiyang Yu
Yongbin Li
Minzheng Wang
Longze Chen
Fei Huang
35
5
0
21 Sep 2024
Multimodal Fusion with LLMs for Engagement Prediction in Natural
  Conversation
Multimodal Fusion with LLMs for Engagement Prediction in Natural Conversation
Cheng Charles Ma
Kevin Hyekang Joo
Alexandria K. Vail
Sunreeta Bhattacharya
Álvaro Fernández García
Kailana Baker-Matsuoka
Sheryl Mathew
Lori L. Holt
Fernando De la Torre
47
3
0
13 Sep 2024
AI-LieDar: Examine the Trade-off Between Utility and Truthfulness in LLM Agents
AI-LieDar: Examine the Trade-off Between Utility and Truthfulness in LLM Agents
Zhe Su
Xuhui Zhou
Sanketh Rangreji
Anubha Kabra
Julia Mendelsohn
Faeze Brahman
Maarten Sap
LLMAG
95
2
0
13 Sep 2024
SimulBench: Evaluating Language Models with Creative Simulation Tasks
SimulBench: Evaluating Language Models with Creative Simulation Tasks
Qi Jia
Xiang Yue
Tianyu Zheng
Jie Huang
Bill Yuchen Lin
LM&MA
34
3
0
11 Sep 2024
PrivacyLens: Evaluating Privacy Norm Awareness of Language Models in Action
PrivacyLens: Evaluating Privacy Norm Awareness of Language Models in Action
Yijia Shao
Tianshi Li
Weiyan Shi
Yanchen Liu
Diyi Yang
PILM
49
13
0
29 Aug 2024
LLMs generate structurally realistic social networks but overestimate political homophily
LLMs generate structurally realistic social networks but overestimate political homophily
Serina Chang
Alicja Chaszczewicz
Emma Wang
Maya Josifovska
Emma Pierson
J. Leskovec
40
6
0
29 Aug 2024
Fostering Natural Conversation in Large Language Models with NICO: a
  Natural Interactive COnversation dataset
Fostering Natural Conversation in Large Language Models with NICO: a Natural Interactive COnversation dataset
Renliang Sun
Mengyuan Liu
Shiping Yang
Rui Wang
Junqing He
Jiaxing Zhang
25
2
0
18 Aug 2024
MMRole: A Comprehensive Framework for Developing and Evaluating Multimodal Role-Playing Agents
MMRole: A Comprehensive Framework for Developing and Evaluating Multimodal Role-Playing Agents
Yanqi Dai
Huanran Hu
Lei Wang
Shengjie Jin
X. Chen
Zhiwu Lu
LLMAG
56
7
0
08 Aug 2024
SynCPKL: Harnessing LLMs to Generate Synthetic Data for Commonsense
  Persona Knowledge Linking
SynCPKL: Harnessing LLMs to Generate Synthetic Data for Commonsense Persona Knowledge Linking
Kuan-Yen Lin
38
0
0
21 Jul 2024
Werewolf Arena: A Case Study in LLM Evaluation via Social Deduction
Werewolf Arena: A Case Study in LLM Evaluation via Social Deduction
Suma Bailis
Jane Friedhoff
Feiyang Chen
40
4
0
18 Jul 2024
PersLLM: A Personified Training Approach for Large Language Models
PersLLM: A Personified Training Approach for Large Language Models
Zheni Zeng
Jiayi Chen
H. Chen
Yukun Yan
Yuxuan Chen
Zhenghao Liu
Zhiyuan Liu
Maosong Sun
LLMAG
37
2
0
17 Jul 2024
Enhancing Language Model Rationality with Bi-Directional Deliberation
  Reasoning
Enhancing Language Model Rationality with Bi-Directional Deliberation Reasoning
Yadong Zhang
Shaoguang Mao
Wenshan Wu
Yan Xia
Tao Ge
Man Lan
Furu Wei
48
2
0
08 Jul 2024
Cactus: Towards Psychological Counseling Conversations using Cognitive
  Behavioral Theory
Cactus: Towards Psychological Counseling Conversations using Cognitive Behavioral Theory
Suyeon Lee
Sunghwan Kim
Minju Kim
Dongjin Kang
Dongil Yang
...
Seungbeen Lee
Kyoung-Mee Chung
Youngjae Yu
Dongha Lee
Jinyoung Yeo
29
5
0
03 Jul 2024
Mitigating Hallucination in Fictional Character Role-Play
Mitigating Hallucination in Fictional Character Role-Play
Nafis Sadeq
Zhouhang Xie
Byungkyu Kang
Prarit Lamba
Xiang Gao
Julian McAuley
HILM
33
6
0
25 Jun 2024
BigCodeBench: Benchmarking Code Generation with Diverse Function Calls and Complex Instructions
BigCodeBench: Benchmarking Code Generation with Diverse Function Calls and Complex Instructions
Terry Yue Zhuo
Minh Chien Vu
Jenny Chim
Han Hu
Wenhao Yu
...
David Lo
Daniel Fried
Xiaoning Du
H. D. Vries
Leandro von Werra
65
128
0
22 Jun 2024
12
Next