ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1712.05812
  4. Cited By
Occam's razor is insufficient to infer the preferences of irrational
  agents

Occam's razor is insufficient to infer the preferences of irrational agents

15 December 2017
Stuart Armstrong
Sören Mindermann
ArXivPDFHTML

Papers citing "Occam's razor is insufficient to infer the preferences of irrational agents"

26 / 26 papers shown
Title
Rethinking Reward Model Evaluation: Are We Barking up the Wrong Tree?
Rethinking Reward Model Evaluation: Are We Barking up the Wrong Tree?
Xueru Wen
Jie Lou
Yaojie Lu
Hongyu Lin
Xing Yu
Xinyu Lu
Xianpei Han
Xianpei Han
Debing Zhang
Le Sun
ALM
69
5
0
17 Feb 2025
Agency Is Frame-Dependent
Agency Is Frame-Dependent
David Abel
André Barreto
Michael Bowling
Will Dabney
Shi Dong
...
Doina Precup
Jonathan Richens
Mark Rowland
Tom Schaul
Satinder Singh
93
1
0
06 Feb 2025
Value Preferences Estimation and Disambiguation in Hybrid Participatory Systems
Value Preferences Estimation and Disambiguation in Hybrid Participatory Systems
Enrico Liscio
Luciano Cavalcante Siebert
Catholijn M. Jonker
P. Murukannaiah
42
4
0
26 Feb 2024
Cooperation and Control in Delegation Games
Cooperation and Control in Delegation Games
Oliver Sourbut
Lewis Hammond
Harriet Wood
37
0
0
24 Feb 2024
(Ir)rationality in AI: State of the Art, Research Challenges and Open Questions
(Ir)rationality in AI: State of the Art, Research Challenges and Open Questions
Olivia Macmillan-Scott
Mirco Musolesi
42
1
0
28 Nov 2023
Inverse Decision Modeling: Learning Interpretable Representations of
  Behavior
Inverse Decision Modeling: Learning Interpretable Representations of Behavior
Daniel Jarrett
Alihan Huyuk
M. Schaar
AI4CE
22
27
0
28 Oct 2023
Towards Understanding Sycophancy in Language Models
Towards Understanding Sycophancy in Language Models
Mrinank Sharma
Meg Tong
Tomasz Korbak
David Duvenaud
Amanda Askell
...
Oliver Rausch
Nicholas Schiefer
Da Yan
Miranda Zhang
Ethan Perez
227
197
0
20 Oct 2023
Designing Fiduciary Artificial Intelligence
Designing Fiduciary Artificial Intelligence
Sebastian Benthall
David Shekman
51
4
0
27 Jul 2023
AI Alignment Dialogues: An Interactive Approach to AI Alignment in Support Agents
Pei-Yu Chen
Myrthe L. Tielman
Dirk K. J. Heylen
Catholijn M. Jonker
M. Birna van Riemsdijk
8
2
0
16 Jan 2023
Misspecification in Inverse Reinforcement Learning
Misspecification in Inverse Reinforcement Learning
Joar Skalse
Alessandro Abate
33
22
0
06 Dec 2022
Reinforcement Learning and Bandits for Speech and Language Processing:
  Tutorial, Review and Outlook
Reinforcement Learning and Bandits for Speech and Language Processing: Tutorial, Review and Outlook
Baihan Lin
OffRL
AI4TS
34
27
0
24 Oct 2022
Scaling Laws for Reward Model Overoptimization
Scaling Laws for Reward Model Overoptimization
Leo Gao
John Schulman
Jacob Hilton
ALM
41
493
0
19 Oct 2022
Perspectives on Incorporating Expert Feedback into Model Updates
Perspectives on Incorporating Expert Feedback into Model Updates
Valerie Chen
Umang Bhatt
Hoda Heidari
Adrian Weller
Ameet Talwalkar
40
11
0
13 May 2022
GPT-NeoX-20B: An Open-Source Autoregressive Language Model
GPT-NeoX-20B: An Open-Source Autoregressive Language Model
Sid Black
Stella Biderman
Eric Hallahan
Quentin G. Anthony
Leo Gao
...
Shivanshu Purohit
Laria Reynolds
J. Tow
Benqi Wang
Samuel Weinbach
102
803
0
14 Apr 2022
The dangers in algorithms learning humans' values and irrationalities
The dangers in algorithms learning humans' values and irrationalities
Rebecca Gormann
Stuart Armstrong
25
2
0
28 Feb 2022
Impossibility Results in AI: A Survey
Impossibility Results in AI: A Survey
Mario Brčič
Roman V. Yampolskiy
29
25
0
01 Sep 2021
Toward AI Assistants That Let Designers Design
Toward AI Assistants That Let Designers Design
Sebastiaan De Peuter
Antti Oulasvirta
Samuel Kaski
AI4CE
29
19
0
22 Jul 2021
Uncertain Decisions Facilitate Better Preference Learning
Uncertain Decisions Facilitate Better Preference Learning
Cassidy Laidlaw
Stuart J. Russell
30
11
0
19 Jun 2021
ROAD: The ROad event Awareness Dataset for Autonomous Driving
ROAD: The ROad event Awareness Dataset for Autonomous Driving
Gurkirt Singh
Stephen Akrigg
Manuele Di Maio
Valentina Fontana
Reza Javanmard Alitappeh
...
Salman Khan
S. Grazioso
Andrew Bradley
G. Gironimo
Fabio Cuzzolin
32
89
0
23 Feb 2021
Open Problems in Cooperative AI
Open Problems in Cooperative AI
Allan Dafoe
Edward Hughes
Yoram Bachrach
Tantum Collins
Kevin R. McKee
Joel Z Leibo
Kate Larson
T. Graepel
42
200
0
15 Dec 2020
Online Bayesian Goal Inference for Boundedly-Rational Planning Agents
Online Bayesian Goal Inference for Boundedly-Rational Planning Agents
Zhi-Xuan Tan
Jordyn L. Mann
Tom Silver
J. Tenenbaum
Vikash K. Mansinghka
OffRL
26
89
0
13 Jun 2020
Risks from Learned Optimization in Advanced Machine Learning Systems
Risks from Learned Optimization in Advanced Machine Learning Systems
Evan Hubinger
Chris van Merwijk
Vladimir Mikulik
Joar Skalse
Scott Garrabrant
45
145
0
05 Jun 2019
Unpredictability of AI
Unpredictability of AI
Roman V. Yampolskiy
21
30
0
29 May 2019
Embedded Agency
Embedded Agency
A. Demski
Scott Garrabrant
AIFin
35
34
0
25 Feb 2019
Scalable agent alignment via reward modeling: a research direction
Scalable agent alignment via reward modeling: a research direction
Jan Leike
David M. Krueger
Tom Everitt
Miljan Martic
Vishal Maini
Shane Legg
34
397
0
19 Nov 2018
Exploring Hierarchy-Aware Inverse Reinforcement Learning
Exploring Hierarchy-Aware Inverse Reinforcement Learning
Chris Cundy
Daniel Filan
BDL
OffRL
37
5
0
13 Jul 2018
1