Occam's razor is insufficient to infer the preferences of irrational agents

15 December 2017

Papers citing "Occam's razor is insufficient to infer the preferences of irrational agents"

26 / 26 papers shown

Title
Rethinking Reward Model Evaluation: Are We Barking up the Wrong Tree? Xueru Wen Jie Lou Yaojie Lu Hongyu Lin Xing Yu Xinyu Lu Xianpei Han Xianpei Han Debing Zhang Le Sun ALM 69 5 0 17 Feb 2025
Agency Is Frame-Dependent David Abel André Barreto Michael Bowling Will Dabney Shi Dong ... Doina Precup Jonathan Richens Mark Rowland Tom Schaul Satinder Singh 93 1 0 06 Feb 2025
Value Preferences Estimation and Disambiguation in Hybrid Participatory Systems Enrico Liscio Luciano Cavalcante Siebert Catholijn M. Jonker P. Murukannaiah 42 4 0 26 Feb 2024
Cooperation and Control in Delegation Games Oliver Sourbut Lewis Hammond Harriet Wood 37 0 0 24 Feb 2024
(Ir)rationality in AI: State of the Art, Research Challenges and Open Questions Olivia Macmillan-Scott Mirco Musolesi 42 1 0 28 Nov 2023
Inverse Decision Modeling: Learning Interpretable Representations of Behavior Daniel Jarrett Alihan Huyuk M. Schaar AI4CE 22 27 0 28 Oct 2023
Towards Understanding Sycophancy in Language Models Mrinank Sharma Meg Tong Tomasz Korbak David Duvenaud Amanda Askell ... Oliver Rausch Nicholas Schiefer Da Yan Miranda Zhang Ethan Perez 227 197 0 20 Oct 2023
Designing Fiduciary Artificial Intelligence Sebastian Benthall David Shekman 51 4 0 27 Jul 2023
AI Alignment Dialogues: An Interactive Approach to AI Alignment in Support Agents Pei-Yu Chen Myrthe L. Tielman Dirk K. J. Heylen Catholijn M. Jonker M. Birna van Riemsdijk 8 2 0 16 Jan 2023
Misspecification in Inverse Reinforcement Learning Joar Skalse Alessandro Abate 33 22 0 06 Dec 2022
Reinforcement Learning and Bandits for Speech and Language Processing: Tutorial, Review and Outlook Baihan Lin OffRL AI4TS 34 27 0 24 Oct 2022
Scaling Laws for Reward Model Overoptimization Leo Gao John Schulman Jacob Hilton ALM 41 493 0 19 Oct 2022
Perspectives on Incorporating Expert Feedback into Model Updates Valerie Chen Umang Bhatt Hoda Heidari Adrian Weller Ameet Talwalkar 40 11 0 13 May 2022
GPT-NeoX-20B: An Open-Source Autoregressive Language Model Sid Black Stella Biderman Eric Hallahan Quentin G. Anthony Leo Gao ... Shivanshu Purohit Laria Reynolds J. Tow Benqi Wang Samuel Weinbach 102 803 0 14 Apr 2022
The dangers in algorithms learning humans' values and irrationalities Rebecca Gormann Stuart Armstrong 25 2 0 28 Feb 2022
Impossibility Results in AI: A Survey Mario Brčič Roman V. Yampolskiy 29 25 0 01 Sep 2021
Toward AI Assistants That Let Designers Design Sebastiaan De Peuter Antti Oulasvirta Samuel Kaski AI4CE 29 19 0 22 Jul 2021
Uncertain Decisions Facilitate Better Preference Learning Cassidy Laidlaw Stuart J. Russell 30 11 0 19 Jun 2021
ROAD: The ROad event Awareness Dataset for Autonomous Driving Gurkirt Singh Stephen Akrigg Manuele Di Maio Valentina Fontana Reza Javanmard Alitappeh ... Salman Khan S. Grazioso Andrew Bradley G. Gironimo Fabio Cuzzolin 32 89 0 23 Feb 2021
Open Problems in Cooperative AI Allan Dafoe Edward Hughes Yoram Bachrach Tantum Collins Kevin R. McKee Joel Z Leibo Kate Larson T. Graepel 42 200 0 15 Dec 2020
Online Bayesian Goal Inference for Boundedly-Rational Planning Agents Zhi-Xuan Tan Jordyn L. Mann Tom Silver J. Tenenbaum Vikash K. Mansinghka OffRL 26 89 0 13 Jun 2020
Risks from Learned Optimization in Advanced Machine Learning Systems Evan Hubinger Chris van Merwijk Vladimir Mikulik Joar Skalse Scott Garrabrant 45 145 0 05 Jun 2019
Unpredictability of AI Roman V. Yampolskiy 21 30 0 29 May 2019
Embedded Agency A. Demski Scott Garrabrant AIFin 35 34 0 25 Feb 2019
Scalable agent alignment via reward modeling: a research direction Jan Leike David M. Krueger Tom Everitt Miljan Martic Vishal Maini Shane Legg 34 397 0 19 Nov 2018
Exploring Hierarchy-Aware Inverse Reinforcement Learning Chris Cundy Daniel Filan BDL OffRL 37 5 0 13 Jul 2018