Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2402.07221
Cited By
The Reasons that Agents Act: Intention and Instrumental Goals
11 February 2024
Francis Rhys Ward
Matt MacDermott
Francesco Belardinelli
Francesca Toni
Tom Everitt
AI4CE
Re-assign community
ArXiv
PDF
HTML
Papers citing
"The Reasons that Agents Act: Intention and Instrumental Goals"
11 / 11 papers shown
Title
OpenDeception: Benchmarking and Investigating AI Deceptive Behaviors via Open-ended Interaction Simulation
Yichen Wu
Xudong Pan
Geng Hong
Min Yang
LLMAG
29
0
0
18 Apr 2025
Higher-Order Belief in Incomplete Information MAIDs
Jack Foxabbott
Rohan Subramani
Francis Rhys Ward
36
0
0
08 Mar 2025
Measuring Goal-Directedness
Matt MacDermott
James Fox
Francesco Belardinelli
Tom Everitt
78
1
0
06 Dec 2024
From Imitation to Introspection: Probing Self-Consciousness in Language Models
Sirui Chen
Shu Yu
Shengjie Zhao
Chaochao Lu
MILM
LRM
30
1
0
24 Oct 2024
Evaluating Language Model Character Traits
Francis Rhys Ward
Zejia Yang
Alex Jackson
Randy Brown
Chandler Smith
Grace Colverd
Louis Thomson
Raymond Douglas
Patrik Bartak
Andrew Rowan
32
0
0
05 Oct 2024
Possible principles for aligned structure learning agents
Lancelot Da Costa
Tomáš Gavenčiak
David Hyland
Mandana Samiei
Cristian Dragos-Manta
Candice Pattisapu
Adeel Razi
Karl J. Friston
16
1
0
30 Sep 2024
AI Sandbagging: Language Models can Strategically Underperform on Evaluations
Teun van der Weij
Felix Hofstätter
Ollie Jaffe
Samuel F. Brown
Francis Rhys Ward
ELM
30
22
0
11 Jun 2024
Robust agents learn causal world models
Jonathan G. Richens
Tom Everitt
OOD
111
34
0
16 Feb 2024
Honesty Is the Best Policy: Defining and Mitigating AI Deception
Francis Rhys Ward
Francesco Belardinelli
Francesca Toni
Tom Everitt
110
27
0
03 Dec 2023
In-context Learning and Induction Heads
Catherine Olsson
Nelson Elhage
Neel Nanda
Nicholas Joseph
Nova Dassarma
...
Tom B. Brown
Jack Clark
Jared Kaplan
Sam McCandlish
C. Olah
240
453
0
24 Sep 2022
User Tampering in Reinforcement Learning Recommender Systems
Charles Evans
Atoosa Kasirzadeh
OffRL
AAML
79
39
0
09 Sep 2021
1