The Reasons that Agents Act: Intention and Instrumental Goals

The Reasons that Agents Act: Intention and Instrumental Goals

11 February 2024

Francis Rhys Ward

Matt MacDermott

Francesco Belardinelli

Papers citing "The Reasons that Agents Act: Intention and Instrumental Goals"

11 / 11 papers shown

Title
OpenDeception: Benchmarking and Investigating AI Deceptive Behaviors via Open-ended Interaction Simulation Yichen Wu Xudong Pan Geng Hong Min Yang LLMAG 29 0 0 18 Apr 2025
Higher-Order Belief in Incomplete Information MAIDs Jack Foxabbott Rohan Subramani Francis Rhys Ward 36 0 0 08 Mar 2025
Measuring Goal-Directedness Matt MacDermott James Fox Francesco Belardinelli Tom Everitt 78 1 0 06 Dec 2024
From Imitation to Introspection: Probing Self-Consciousness in Language Models Sirui Chen Shu Yu Shengjie Zhao Chaochao Lu MILM LRM 30 1 0 24 Oct 2024
Evaluating Language Model Character Traits Francis Rhys Ward Zejia Yang Alex Jackson Randy Brown Chandler Smith Grace Colverd Louis Thomson Raymond Douglas Patrik Bartak Andrew Rowan 32 0 0 05 Oct 2024
Possible principles for aligned structure learning agents Lancelot Da Costa Tomáš Gavenčiak David Hyland Mandana Samiei Cristian Dragos-Manta Candice Pattisapu Adeel Razi Karl J. Friston 16 1 0 30 Sep 2024
AI Sandbagging: Language Models can Strategically Underperform on Evaluations Teun van der Weij Felix Hofstätter Ollie Jaffe Samuel F. Brown Francis Rhys Ward ELM 30 22 0 11 Jun 2024
Robust agents learn causal world models Jonathan G. Richens Tom Everitt OOD 111 34 0 16 Feb 2024
Honesty Is the Best Policy: Defining and Mitigating AI Deception Francis Rhys Ward Francesco Belardinelli Francesca Toni Tom Everitt 110 27 0 03 Dec 2023
In-context Learning and Induction Heads Catherine Olsson Nelson Elhage Neel Nanda Nicholas Joseph Nova Dassarma ... Tom B. Brown Jack Clark Jared Kaplan Sam McCandlish C. Olah 240 453 0 24 Sep 2022
User Tampering in Reinforcement Learning Recommender Systems Charles Evans Atoosa Kasirzadeh OffRL AAML 79 39 0 09 Sep 2021