CogBench: a large language model walks into a psychology lab

28 February 2024

Papers citing "CogBench: a large language model walks into a psychology lab"

35 / 35 papers shown

Are Large Language Models Sensitive to the Motives Behind Communication?

164

22 Oct 2025

Unraveling the cognitive patterns of Large Language Models through module communities

Kushal Raj Bhandari

Pin-Yu Chen

Jianxi Gao

25 Aug 2025

How Do LLMs Persuade? Linear Probes Can Uncover Persuasion Dynamics in Multi-Turn Conversations

Brandon Jaipersaud

David M. Krueger

Ekdeep Singh Lubana

100

07 Aug 2025

How LLMs Comprehend Temporal Meaning in Narratives: A Case Study in Cognitive Evaluation of LLMsAnnual Meeting of the Association for Computational Linguistics (ACL), 2025

116

18 Jul 2025

Mixture of Cognitive Reasoners: Modular Reasoning with Brain-Like Specialization

253

16 Jun 2025

Efficient Ensemble for Fine-tuning Language Models on Multiple DatasetsAnnual Meeting of the Association for Computational Linguistics (ACL), 2025

169

28 May 2025

Are Large Language Models Reliable AI Scientists? Assessing Reverse-Engineering of Black-Box Systems

326

23 May 2025

Understanding LLM Scientific Reasoning through Promptings and Model's Explanation on the Answers

...

Sridhar Krishnan Venkat Bhat

Venkat Bhat

ELM LRM

353

02 May 2025

Memorization and Knowledge Injection in Gated LLMs

317

30 Apr 2025

Toward Efficient Exploration by Large Language Model Agents

Dilip Arumugam

Thomas L. Griffiths

LLMAG

406

29 Apr 2025

Strong Memory, Weak Control: An Empirical Study of Executive Functioning in LLMs

356

03 Apr 2025

The Greatest Good Benchmark: Measuring LLMs' Alignment with Utilitarian Moral Dilemmas

Giovanni Franco Gabriel Marraffini

186

25 Mar 2025

Levels of Analysis for Large Language Models

...

365

17 Mar 2025

LLM Agents Display Human Biases but Exhibit Distinct Learning Patterns

Idan Horowitz

Ori Plonsky

266

13 Mar 2025

On Benchmarking Human-Like Intelligence in Machines

912

27 Feb 2025

Human Cognitive Benchmarks Reveal Foundational Visual Gaps in MLLMs

438

23 Feb 2025

Paradigms of AI Evaluation: Mapping Goals, Methodologies and CultureInternational Joint Conference on Artificial Intelligence (IJCAI), 2024

John Burden

Marko Tesic

Lorenzo Pacchiardi

José Hernández-Orallo

308

21 Feb 2025

The potential -- and the pitfalls -- of using pre-trained language models as cognitive science theories

Raj Sanjay Shah

Sashank Varma

LRM

413

22 Jan 2025

Meaningless is better: hashing bias-inducing words in LLM prompts improves performance in logical reasoning and statistical learning

Milena Chadimová

Eduard Jurášek

Tomáš Kliegr

437

26 Nov 2024

VideoCogQA: A Controllable Benchmark for Evaluating Cognitive Abilities in Video-Language Models

418

14 Nov 2024

Game-theoretic LLM: Agent Workflow for Negotiation Games

...

368

08 Nov 2024

Can LLMs make trade-offs involving stipulated pain and pleasure states?

Blaise Agüera y Arcas

Jonathan Birch

218

01 Nov 2024

Large Language Model Benchmarks in Medical Tasks

...

695

28 Oct 2024

Mind Your Step (by Step): Chain-of-Thought can Reduce Performance on Tasks where Thinking Makes Humans Worse

429

27 Oct 2024

TeachTune: Reviewing Pedagogical Agents Against Diverse Student Profiles with Simulated StudentsInternational Conference on Human Factors in Computing Systems (CHI), 2024

340

05 Oct 2024

How Does Code Pretraining Affect Language Model Task Performance?

Jackson Petty

Sjoerd van Steenkiste

Tal Linzen

367

06 Sep 2024

Large Language Models and Cognitive Science: A Comprehensive Review of Similarities, Differences, and Challenges

Qian Niu

Junyu Liu

Ziqian Bi

Pohsun Feng

Benji Peng

...

Ming Li

Lawrence KQ Yan

Yichao Zhang

Caitlyn Heqi Yin

Cheng Fei

404

04 Sep 2024

Evaluating AI Evaluation: Perils and Prospects

John Burden

ELM

220

12 Jul 2024

Large Language Model Recall Uncertainty is Modulated by the Fan Effect

Doug Fisher

302

08 Jul 2024

Large Language Models Assume People are More Rational than We Really are

508

24 Jun 2024

M3GIA: A Cognition Inspired Multilingual and Multimodal General Intelligence Ability Benchmark

Yadong Li

...

272

08 Jun 2024

Language Models Trained to do Arithmetic Predict Human Risky and Intertemporal Choice

Jian-Qiao Zhu

Haijiang Yan

Thomas Griffiths

300

29 May 2024

Thai Winograd Schemas: A Benchmark for Thai Commonsense Reasoning

Phakphum Artkaew

LRM

171

28 May 2024

Large Language Models are Biased Reinforcement Learners

202

19 May 2024

Can large language models explore in-context?Neural Information Processing Systems (NeurIPS), 2024

586

22 Mar 2024