ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2502.06403
48
0

The AI off-switch problem as a signalling game: bounded rationality and incomparability

10 February 2025
A. Benavoli
Alessandro Facchini
Marco Zaffalon
ArXivPDFHTML
Abstract

The off-switch problem is a critical challenge in AI control: if an AI system resists being switched off, it poses a significant risk. In this paper, we model the off-switch problem as a signalling game, where a human decision-maker communicates its preferences about some underlying decision problem to an AI agent, which then selects actions to maximise the human's utility. We assume that the human is a bounded rational agent and explore various bounded rationality mechanisms. Using real machine learning models, we reprove prior results and demonstrate that a necessary condition for an AI system to refrain from disabling its off-switch is its uncertainty about the human's utility. We also analyse how message costs influence optimal strategies and extend the analysis to scenarios involving incomparability.

View on arXiv
@article{benavoli2025_2502.06403,
  title={ The AI off-switch problem as a signalling game: bounded rationality and incomparability },
  author={ Alessio Benavoli and Alessandro Facchini and Marco Zaffalon },
  journal={arXiv preprint arXiv:2502.06403},
  year={ 2025 }
}
Comments on this paper