ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2302.08399
  4. Cited By
Large Language Models Fail on Trivial Alterations to Theory-of-Mind
  Tasks
v1v2v3v4v5 (latest)

Large Language Models Fail on Trivial Alterations to Theory-of-Mind Tasks

16 February 2023
T. Ullman
    LRM
ArXiv (abs)PDFHTML

Papers citing "Large Language Models Fail on Trivial Alterations to Theory-of-Mind Tasks"

50 / 99 papers shown
Title
Spot The Ball: A Benchmark for Visual Social Inference
Spot The Ball: A Benchmark for Visual Social Inference
Neha Balamurugan
Sarah Wu
Adam Chun
Gabe Gaw
Cristobal Eyzaguirre
Tobias Gerstenberg
LRM
75
0
0
31 Oct 2025
Are Large Language Models Sensitive to the Motives Behind Communication?
Are Large Language Models Sensitive to the Motives Behind Communication?
Addison J. Wu
Ryan Liu
Kerem Oktar
T. Sumers
Thomas L. Griffiths
88
0
0
22 Oct 2025
TactfulToM: Do LLMs Have the Theory of Mind Ability to Understand White Lies?
TactfulToM: Do LLMs Have the Theory of Mind Ability to Understand White Lies?
Yiwei Liu
Emma Jane Pretty
Jiahao Huang
Saku Sugawara
112
0
0
21 Sep 2025
Rationality Check! Benchmarking the Rationality of Large Language Models
Rationality Check! Benchmarking the Rationality of Large Language Models
Zhilun Zhou
Jing Yi Wang
Nicholas Sukiennik
Chen Gao
Fengli Xu
Yong Li
James Evans
LRM
80
0
0
18 Sep 2025
ToM-SSI: Evaluating Theory of Mind in Situated Social Interactions
ToM-SSI: Evaluating Theory of Mind in Situated Social Interactions
Matteo Bortoletto
Constantin Ruhdorfer
Andreas Bulling
99
0
0
05 Sep 2025
Memorization $\neq$ Understanding: Do Large Language Models Have the Ability of Scenario Cognition?
Memorization ≠\neq= Understanding: Do Large Language Models Have the Ability of Scenario Cognition?
Boxiang Ma
Ru Li
Yuanlong Wang
Hongye Tan
Xiaoli Li
68
1
0
05 Sep 2025
The Quasi-Creature and the Uncanny Valley of Agency: A Synthesis of Theory and Evidence on User Interaction with Inconsistent Generative AI
The Quasi-Creature and the Uncanny Valley of Agency: A Synthesis of Theory and Evidence on User Interaction with Inconsistent Generative AI
Mauricio Manhaes
Christine Miller
Nicholas Schroeder
44
0
0
25 Aug 2025
Human-Alignment and Calibration of Inference-Time Uncertainty in Large Language Models
Human-Alignment and Calibration of Inference-Time Uncertainty in Large Language Models
Kyle Moore
Jesse Roberts
Daryl Watson
72
1
0
11 Aug 2025
Recognising, Anticipating, and Mitigating LLM Pollution of Online Behavioural Research
Recognising, Anticipating, and Mitigating LLM Pollution of Online Behavioural Research
Raluca Rilla
Tobias Werner
Hiromu Yakura
Iyad Rahwan
Anne-Marie Nussberger
72
0
0
02 Aug 2025
Integrating LLM in Agent-Based Social Simulation: Opportunities and Challenges
Integrating LLM in Agent-Based Social Simulation: Opportunities and Challenges
P. Taillandier
Jean Daniel Zucker
Arnaud Grignard
Benoit Gaudou
Nghi Quang Huynh
A. Drogoul
LLMAG
195
3
0
25 Jul 2025
Language Models Might Not Understand You: Evaluating Theory of Mind via Story Prompting
Language Models Might Not Understand You: Evaluating Theory of Mind via Story Prompting
Nathaniel Getachew
Abulhair Saparov
LRM
103
0
0
23 Jun 2025
From Prompts to Constructs: A Dual-Validity Framework for LLM Research in Psychology
From Prompts to Constructs: A Dual-Validity Framework for LLM Research in Psychology
Zhicheng Lin
162
2
0
20 Jun 2025
Language-Informed Synthesis of Rational Agent Models for Grounded Theory-of-Mind Reasoning On-The-Fly
Language-Informed Synthesis of Rational Agent Models for Grounded Theory-of-Mind Reasoning On-The-Fly
Lance Ying
Ryan Truong
Katherine M. Collins
Cedegao E. Zhang
Megan Wei
Tyler Brooke-Wilson
Tan Zhi-Xuan
Lionel Wong
J. Tenenbaum
LLMAG
128
5
0
20 Jun 2025
Can structural correspondences ground real world representational content in Large Language Models?
Can structural correspondences ground real world representational content in Large Language Models?
Iwan Williams
112
1
0
19 Jun 2025
From Black Boxes to Transparent Minds: Evaluating and Enhancing the Theory of Mind in Multimodal Large Language Models
From Black Boxes to Transparent Minds: Evaluating and Enhancing the Theory of Mind in Multimodal Large Language Models
Xinyang Li
Siqi Liu
Bochao Zou
Jiansheng Chen
Huimin Ma
140
1
0
17 Jun 2025
EvolvTrip: Enhancing Literary Character Understanding with Temporal Theory-of-Mind Graphs
EvolvTrip: Enhancing Literary Character Understanding with Temporal Theory-of-Mind Graphs
Bohao Yang
Hainiu Xu
Jinhua Du
Ze Li
Petr Slovak
Chenghua Lin
129
0
0
16 Jun 2025
MotiveBench: How Far Are We From Human-Like Motivational Reasoning in Large Language Models?
MotiveBench: How Far Are We From Human-Like Motivational Reasoning in Large Language Models?Annual Meeting of the Association for Computational Linguistics (ACL), 2025
Xixian Yong
Jianxun Lian
Xiaoyuan Yi
Xiao Zhou
Xing Xie
LRM
155
0
0
16 Jun 2025
UniToMBench: Integrating Perspective-Taking to Improve Theory of Mind in LLMs
UniToMBench: Integrating Perspective-Taking to Improve Theory of Mind in LLMs
Prameshwar Thiyagarajan
Vaishnavi Parimi
Shamant Sai
Soumil Garg
Zhangir Meirbek
Nitin Yarlagadda
Kevin Zhu
Chris Kim
LLMAG
146
1
0
11 Jun 2025
Let's Put Ourselves in Sally's Shoes: Shoes-of-Others Prefixing Improves Theory of Mind in Large Language Models
Let's Put Ourselves in Sally's Shoes: Shoes-of-Others Prefixing Improves Theory of Mind in Large Language Models
Kazutoshi Shinoda
Nobukatsu Hojo
Kyosuke Nishida
Yoshihiro Yamazaki
Keita Suzuki
Hiroaki Sugiyama
Kuniko Saito
170
1
0
06 Jun 2025
XToM: Exploring the Multilingual Theory of Mind for Large Language Models
XToM: Exploring the Multilingual Theory of Mind for Large Language Models
Chunkit Chan
Yauwai Yim
Hongchuan Zeng
Zhiying Zou
Xinyuan Cheng
...
Ginny Wong
Helmut Schmid
Hinrich Schütze
Simon See
Yangqiu Song
LRM
156
0
0
03 Jun 2025
Representations of Fact, Fiction and Forecast in Large Language Models: Epistemics and Attitudes
Representations of Fact, Fiction and Forecast in Large Language Models: Epistemics and AttitudesAnnual Meeting of the Association for Computational Linguistics (ACL), 2025
Meng Li
Michael Vrazitulis
David Schlangen
177
0
0
02 Jun 2025
Frictional Agent Alignment Framework: Slow Down and Don't Break Things
Frictional Agent Alignment Framework: Slow Down and Don't Break ThingsAnnual Meeting of the Association for Computational Linguistics (ACL), 2025
Abhijnan Nath
Carine Graff
Andrei Bachinin
Nikhil Krishnaswamy
225
4
0
26 May 2025
Large Language Models' Reasoning Stalls: An Investigation into the Capabilities of Frontier Models
Large Language Models' Reasoning Stalls: An Investigation into the Capabilities of Frontier Models
Lachlan McGinness
Peter Baumgartner
ReLMLRMELM
391
1
0
26 May 2025
MetaMind: Modeling Human Social Thoughts with Metacognitive Multi-Agent Systems
MetaMind: Modeling Human Social Thoughts with Metacognitive Multi-Agent Systems
Xuanming Zhang
Yuxuan Chen
Min-Hsuan Yeh
Yixuan Li
LLMAGAI4CE
239
5
0
25 May 2025
Multi-Party Conversational Agents: A Survey
Multi-Party Conversational Agents: A Survey
Sagar Sapkota
M. Hasan
Mubarak Shah
Santu Karmaker
LLMAG
203
0
0
24 May 2025
Towards Dynamic Theory of Mind: Evaluating LLM Adaptation to Temporal Evolution of Human States
Towards Dynamic Theory of Mind: Evaluating LLM Adaptation to Temporal Evolution of Human StatesAnnual Meeting of the Association for Computational Linguistics (ACL), 2025
Yang Xiao
Jiashuo Wang
Qiancheng Xu
Changhe Song
Chunpu Xu
Yi Cheng
Wenjie Li
Pengfei Liu
367
5
0
23 May 2025
DEL-ToM: Inference-Time Scaling for Theory-of-Mind Reasoning via Dynamic Epistemic Logic
DEL-ToM: Inference-Time Scaling for Theory-of-Mind Reasoning via Dynamic Epistemic Logic
Yuheng Wu
Jianwen Xie
Denghui Zhang
Zhaozhuo Xu
LRMVLM
134
2
0
22 May 2025
Language Models use Lookbacks to Track Beliefs
Language Models use Lookbacks to Track Beliefs
Nikhil Prakash
Natalie Shapira
Arnab Sen Sharma
Christoph Riedl
Yonatan Belinkov
Tamar Rott Shaham
David Bau
Atticus Geiger
KELM
247
7
0
20 May 2025
The Influence of Human-inspired Agentic Sophistication in LLM-driven Strategic Reasoners
The Influence of Human-inspired Agentic Sophistication in LLM-driven Strategic Reasoners
Vince Trencsenyi
Agnieszka Mensfelt
Kostas Stathis
LRM
410
2
0
14 May 2025
Do Large Language Models know who did what to whom?
Do Large Language Models know who did what to whom?
Joseph M. Denning
Xiaohan
Bryor Snefjella
Idan A. Blank
483
2
0
23 Apr 2025
Rethinking Theory of Mind Benchmarks for LLMs: Towards A User-Centered Perspective
Rethinking Theory of Mind Benchmarks for LLMs: Towards A User-Centered Perspective
Qiaosi Wang
Xuhui Zhou
Maarten Sap
Jodi Forlizzi
Hong Shen
217
8
0
15 Apr 2025
The Human Robot Social Interaction (HSRI) Dataset: Benchmarking Foundational Models' Social Reasoning
The Human Robot Social Interaction (HSRI) Dataset: Benchmarking Foundational Models' Social Reasoning
Dong Won Lee
Y. Kim
Denison Guvenoz
Sooyeon Jeong
Parker Malachowsky
Louis-Philippe Morency
C. Breazeal
Hae Won Park
231
1
0
07 Apr 2025
Strong Memory, Weak Control: An Empirical Study of Executive Functioning in LLMs
Strong Memory, Weak Control: An Empirical Study of Executive Functioning in LLMs
Karin de Langis
J. Park
Bin Hu
Khanh Chi Le
Andreas Schramm
Michael C. Mensink
Andrew Elfenbein
Dongyeop Kang
276
3
0
03 Apr 2025
Measurement of LLM's Philosophies of Human Nature
Measurement of LLM's Philosophies of Human Nature
Minheng Ni
Ennan Wu
Zidong Gong
Zhiyong Yang
Linjie Li
Chung-Ching Lin
Kevin Qinghong Lin
Lijuan Wang
Wangmeng Zuo
280
0
0
03 Apr 2025
Navigating Rifts in Human-LLM Grounding: Study and Benchmark
Navigating Rifts in Human-LLM Grounding: Study and BenchmarkAnnual Meeting of the Association for Computational Linguistics (ACL), 2025
Omar Shaikh
Hussein Mozannar
Gagan Bansal
Adam Fourney
Eric Horvitz
277
12
0
18 Mar 2025
Learning richness modulates equality reasoning in neural networks
Learning richness modulates equality reasoning in neural networks
William L. Tong
Cengiz Pehlevan
251
0
0
12 Mar 2025
PersuasiveToM: A Benchmark for Evaluating Machine Theory of Mind in Persuasive Dialogues
PersuasiveToM: A Benchmark for Evaluating Machine Theory of Mind in Persuasive Dialogues
Fangxu Yu
Lai Jiang
Shenyi Huang
Zhen Wu
Xinyu Dai
LLMAG
405
6
0
28 Feb 2025
Social Genome: Grounded Social Reasoning Abilities of Multimodal Models
Social Genome: Grounded Social Reasoning Abilities of Multimodal Models
Leena Mathur
Marian Qian
Paul Pu Liang
Louis-Philippe Morency
LRM
992
13
0
21 Feb 2025
ExpertLens: Activation steering features are highly interpretable
ExpertLens: Activation steering features are highly interpretable
Masha Fedzechkina
Eleonora Gualdoni
Sinead Williamson
Katherine Metcalf
Skyler Seto
B. Theobald
257
1
0
20 Feb 2025
Hypothesis-Driven Theory-of-Mind Reasoning for Large Language Models
Hypothesis-Driven Theory-of-Mind Reasoning for Large Language Models
Hyunwoo Kim
Melanie Sclar
Tan Zhi-Xuan
Lance Ying
Sydney Levine
Yang Liu
Joshua B. Tenenbaum
Yejin Choi
LRMLLMAG
202
11
0
17 Feb 2025
Why human-AI relationships need socioaffective alignment
Why human-AI relationships need socioaffective alignmentHumanities and Social Sciences Communications (HSSC), 2025
Hannah Rose Kirk
Iason Gabriel
Chris Summerfield
Bertie Vidgen
Scott A. Hale
200
44
0
04 Feb 2025
Large Language Models as Theory of Mind Aware Generative Agents with Counterfactual Reflection
Bo Yang
Jiaxian Guo
Yusuke Iwasawa
Y. Matsuo
AI4CE
306
3
0
28 Jan 2025
Mind Your Theory: Theory of Mind Goes Deeper Than Reasoning
Mind Your Theory: Theory of Mind Goes Deeper Than ReasoningAnnual Meeting of the Association for Computational Linguistics (ACL), 2024
Eitan Wagner
Nitay Alon
J. Barnby
Omri Abend
LRM
413
6
0
18 Dec 2024
Codenames as a Benchmark for Large Language Models
Codenames as a Benchmark for Large Language ModelsIEEE Transactions on Games (IEEE Trans. Games), 2024
Matthew Stephenson
Matthew Sidji
Benoît Ronval
LLMAGLRMELM
438
2
0
16 Dec 2024
Multi-ToM: Evaluating Multilingual Theory of Mind Capabilities in Large
  Language Models
Multi-ToM: Evaluating Multilingual Theory of Mind Capabilities in Large Language Models
Jayanta Sadhu
Ayan Antik Khan
Noshin Nawal
Sanju Basak
Abhik Bhattacharjee
Rifat Shahriyar
256
5
0
24 Nov 2024
Take Caution in Using LLMs as Human Surrogates: Scylla Ex Machina
Take Caution in Using LLMs as Human Surrogates: Scylla Ex MachinaProceedings of the National Academy of Sciences of the United States of America (PNAS), 2024
Yuan Gao
Dokyun Lee
Gordon Burtch
Sina Fazelpour
LRM
415
40
0
25 Oct 2024
TMGBench: A Systematic Game Benchmark for Evaluating Strategic Reasoning Abilities of LLMs
TMGBench: A Systematic Game Benchmark for Evaluating Strategic Reasoning Abilities of LLMs
Jian Shu
Xiachong Feng
Lei Li
Zhan Qin
Dianbo Sui
Dianbo Sui
Lingpeng Kong
LRMELM
277
10
0
14 Oct 2024
Enhancing Logical Reasoning in Large Language Models through Graph-based
  Synthetic Data
Enhancing Logical Reasoning in Large Language Models through Graph-based Synthetic Data
Jiaming Zhou
Abbas Ghaddar
Ge Zhang
Liheng Ma
Yaochen Hu
Soumyasundar Pal
Mark Coates
Bin Wang
Yingxue Zhang
Jianye Hao
ReLMLRM
281
9
0
19 Sep 2024
Integrated Design and Governance of Agentic AI Systems through Adaptive Information Modulation
Integrated Design and Governance of Agentic AI Systems through Adaptive Information Modulation
Qiliang Chen
Sepehr Ilami
Nunzio Lorè
Babak Heydari
252
3
0
16 Sep 2024
CHARTOM: A Visual Theory-of-Mind Benchmark for LLMs on Misleading Charts
CHARTOM: A Visual Theory-of-Mind Benchmark for LLMs on Misleading Charts
S. Bharti
Shiyun Cheng
Jihyun Rho
Martina Rao
Mu Cai
Yong Jae Lee
Martina Rau
Xiaojin Zhu
451
1
0
26 Aug 2024
12
Next