ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2405.01870
21
1

Detecting and Deterring Manipulation in a Cognitive Hierarchy

3 May 2024
Nitay Alon
Lion Schulz
J. Barnby
J. Rosenschein
Peter Dayan
Peter Dayan
ArXivPDFHTML
Abstract

Social agents with finitely nested opponent models are vulnerable to manipulation by agents with deeper reasoning and more sophisticated opponent modelling. This imbalance, rooted in logic and the theory of recursive modelling frameworks, cannot be solved directly. We propose a computational framework, ℵ\alephℵ-IPOMDP, augmenting model-based RL agents' Bayesian inference with an anomaly detection algorithm and an out-of-belief policy. Our mechanism allows agents to realize they are being deceived, even if they cannot understand how, and to deter opponents via a credible threat. We test this framework in both a mixed-motive and zero-sum game. Our results show the ℵ\alephℵ mechanism's effectiveness, leading to more equitable outcomes and less exploitation by more sophisticated agents. We discuss implications for AI safety, cybersecurity, cognitive science, and psychiatry.

View on arXiv
@article{alon2025_2405.01870,
  title={ Detecting and Deterring Manipulation in a Cognitive Hierarchy },
  author={ Nitay Alon and Joseph M. Barnby and Stefan Sarkadi and Lion Schulz and Jeffrey S. Rosenschein and Peter Dayan },
  journal={arXiv preprint arXiv:2405.01870},
  year={ 2025 }
}
Comments on this paper