Beyond Preferences in AI AlignmentPhilosophical Studies (Philos. Stud.), 2024 |
Human Control: Definitions and AlgorithmsConference on Uncertainty in Artificial Intelligence (UAI), 2023 |
AGI Agent Safety by Iteratively Improving the Utility FunctionArtificial General Intelligence (AGI), 2020 |