ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1711.09883
  4. Cited By
AI Safety Gridworlds

AI Safety Gridworlds

27 November 2017
Jan Leike
Miljan Martic
Victoria Krakovna
Pedro A. Ortega
Tom Everitt
Andrew Lefrancq
Laurent Orseau
Shane Legg
ArXivPDFHTML

Papers citing "AI Safety Gridworlds"

44 / 144 papers shown
Title
Learning the Arrow of Time
Learning the Arrow of Time
Nasim Rahaman
Steffen Wolf
Anirudh Goyal
Roman Remme
Yoshua Bengio
14
5
0
02 Jul 2019
Evolutionary Computation and AI Safety: Research Problems Impeding
  Routine and Safe Real-world Application of Evolution
Evolutionary Computation and AI Safety: Research Problems Impeding Routine and Safe Real-world Application of Evolution
Joel Lehman
12
7
0
24 Jun 2019
Categorizing Wireheading in Partially Embedded Agents
Categorizing Wireheading in Partially Embedded Agents
Arushi G. K. Majha
Sayan Sarkar
Davide Zagami
11
3
0
21 Jun 2019
Epistemic Risk-Sensitive Reinforcement Learning
Epistemic Risk-Sensitive Reinforcement Learning
Hannes Eriksson
Christos Dimitrakakis
27
29
0
14 Jun 2019
Transfer Learning by Modeling a Distribution over Policies
Transfer Learning by Modeling a Distribution over Policies
Disha Shrivastava
Eeshan Gunesh Dhekane
Riashat Islam
OOD
OffRL
14
0
0
09 Jun 2019
A Perspective on Objects and Systematic Generalization in Model-Based RL
A Perspective on Objects and Systematic Generalization in Model-Based RL
Sjoerd van Steenkiste
Klaus Greff
Jürgen Schmidhuber
OCL
OffRL
17
31
0
03 Jun 2019
Defining Admissible Rewards for High Confidence Policy Evaluation
Defining Admissible Rewards for High Confidence Policy Evaluation
Niranjani Prasad
Barbara E. Engelhardt
Finale Doshi-Velez
OffRL
31
6
0
30 May 2019
Don't Forget Your Teacher: A Corrective Reinforcement Learning Framework
Don't Forget Your Teacher: A Corrective Reinforcement Learning Framework
M. Nazari
Majid Jahani
L. Snyder
Martin Takáč
OffRL
OnRL
16
1
0
30 May 2019
Inverse Reinforcement Learning in Contextual MDPs
Inverse Reinforcement Learning in Contextual MDPs
Stav Belogolovsky
Philip Korsunsky
Shie Mannor
Chen Tessler
Tom Zahavy
OffRL
BDL
31
19
0
23 May 2019
A Human-Centered Approach to Interactive Machine Learning
A Human-Centered Approach to Interactive Machine Learning
K. Mathewson
16
7
0
15 May 2019
Safer Deep RL with Shallow MCTS: A Case Study in Pommerman
Safer Deep RL with Shallow MCTS: A Case Study in Pommerman
Bilal Kartal
Pablo Hernandez-Leal
Chao Gao
Matthew E. Taylor
OffRL
17
7
0
10 Apr 2019
Truly Batch Apprenticeship Learning with Deep Successor Features
Truly Batch Apprenticeship Learning with Deep Successor Features
Donghun Lee
Srivatsan Srinivasan
Finale Doshi-Velez
OffRL
OOD
9
35
0
24 Mar 2019
Improving Safety in Reinforcement Learning Using Model-Based
  Architectures and Human Intervention
Improving Safety in Reinforcement Learning Using Model-Based Architectures and Human Intervention
Bharat Prakash
Mohit Khatwani
Nicholas R. Waytowich
T. Mohsenin
OffRL
7
19
0
22 Mar 2019
Safety-Guided Deep Reinforcement Learning via Online Gaussian Process
  Estimation
Safety-Guided Deep Reinforcement Learning via Online Gaussian Process Estimation
Jiameng Fan
Wenchao Li
OffRL
OnRL
GP
8
18
0
06 Mar 2019
Introspection Learning
Introspection Learning
Chris R. Serrano
Michael A. Warren
16
0
0
27 Feb 2019
Unmasking Clever Hans Predictors and Assessing What Machines Really
  Learn
Unmasking Clever Hans Predictors and Assessing What Machines Really Learn
Sebastian Lapuschkin
S. Wäldchen
Alexander Binder
G. Montavon
Wojciech Samek
K. Müller
17
996
0
26 Feb 2019
Understanding Agent Incentives using Causal Influence Diagrams. Part I:
  Single Action Settings
Understanding Agent Incentives using Causal Influence Diagrams. Part I: Single Action Settings
Tom Everitt
Pedro A. Ortega
Elizabeth Barnes
Shane Legg
CML
17
0
0
26 Feb 2019
Conservative Agency via Attainable Utility Preservation
Conservative Agency via Attainable Utility Preservation
Alexander Matt Turner
Dylan Hadfield-Menell
Prasad Tadepalli
30
49
0
26 Feb 2019
Parenting: Safe Reinforcement Learning from Human Input
Parenting: Safe Reinforcement Learning from Human Input
Christopher Frye
Ilya Feige
14
7
0
18 Feb 2019
Iroko: A Framework to Prototype Reinforcement Learning for Data Center
  Traffic Control
Iroko: A Framework to Prototype Reinforcement Learning for Data Center Traffic Control
Fabian Ruffy
Michael Przystupa
Ivan Beschastnikh
22
31
0
24 Dec 2018
Building Ethically Bounded AI
Building Ethically Bounded AI
F. Rossi
Nicholas Mattei
8
75
0
10 Dec 2018
Scalable agent alignment via reward modeling: a research direction
Scalable agent alignment via reward modeling: a research direction
Jan Leike
David M. Krueger
Tom Everitt
Miljan Martic
Vishal Maini
Shane Legg
34
397
0
19 Nov 2018
Integrative Biological Simulation, Neuropsychology, and AI Safety
Integrative Biological Simulation, Neuropsychology, and AI Safety
G. Sarma
A. Safron
Nick J. Hay
14
2
0
07 Nov 2018
Formal Verification of Neural Network Controlled Autonomous Systems
Formal Verification of Neural Network Controlled Autonomous Systems
Xiaowu Sun
Haitham Khedr
Yasser Shoukry
11
134
0
31 Oct 2018
Assessing Generalization in Deep Reinforcement Learning
Assessing Generalization in Deep Reinforcement Learning
Charles Packer
Katelyn Gao
Jernej Kos
Philipp Krahenbuhl
V. Koltun
D. Song
OffRL
18
233
0
29 Oct 2018
The Faults in Our Pi Stars: Security Issues and Open Challenges in Deep
  Reinforcement Learning
The Faults in Our Pi Stars: Security Issues and Open Challenges in Deep Reinforcement Learning
Vahid Behzadan
Arslan Munir
19
27
0
23 Oct 2018
Interpretable Multi-Objective Reinforcement Learning through Policy
  Orchestration
Interpretable Multi-Objective Reinforcement Learning through Policy Orchestration
Ritesh Noothigattu
Djallel Bouneffouf
Nicholas Mattei
Rachita Chandra
Piyush Madan
Kush R. Varshney
Murray Campbell
Moninder Singh
F. Rossi
AI4CE
11
23
0
21 Sep 2018
Incorporating Behavioral Constraints in Online AI Systems
Incorporating Behavioral Constraints in Online AI Systems
Avinash Balakrishnan
Djallel Bouneffouf
Nicholas Mattei
F. Rossi
OffRL
36
66
0
15 Sep 2018
Combining imagination and heuristics to learn strategies that generalize
Combining imagination and heuristics to learn strategies that generalize
Erik J Peterson
Necati Alp Muyesser
Timothy D. Verstynen
Kyle Dunovan
6
0
0
10 Sep 2018
Reinforcement Learning under Threats
Reinforcement Learning under Threats
Víctor Gallego
Roi Naveiro
D. Insua
AAML
21
25
0
05 Sep 2018
A Roadmap for Robust End-to-End Alignment
A Roadmap for Robust End-to-End Alignment
L. Hoang
28
1
0
04 Sep 2018
RuleMatrix: Visualizing and Understanding Classifiers with Rules
RuleMatrix: Visualizing and Understanding Classifiers with Rules
Yao Ming
Huamin Qu
E. Bertini
FAtt
20
214
0
17 Jul 2018
Modeling Friends and Foes
Modeling Friends and Foes
Pedro A. Ortega
Shane Legg
AAML
8
3
0
30 Jun 2018
Penalizing side effects using stepwise relative reachability
Penalizing side effects using stepwise relative reachability
Victoria Krakovna
Laurent Orseau
Ramana Kumar
Miljan Martic
Shane Legg
20
55
0
04 Jun 2018
Virtuously Safe Reinforcement Learning
Virtuously Safe Reinforcement Learning
Henrik Aslund
El-Mahdi El-Mhamdi
R. Guerraoui
Alexandre Maurer
14
5
0
29 May 2018
Reward Constrained Policy Optimization
Reward Constrained Policy Optimization
Chen Tessler
D. Mankowitz
Shie Mannor
11
536
0
28 May 2018
A Lyapunov-based Approach to Safe Reinforcement Learning
A Lyapunov-based Approach to Safe Reinforcement Learning
Yinlam Chow
Ofir Nachum
Edgar A. Duénez-Guzmán
Mohammad Ghavamzadeh
14
500
0
20 May 2018
AGI Safety Literature Review
AGI Safety Literature Review
Tom Everitt
G. Lea
Marcus Hutter
AI4CE
36
115
0
03 May 2018
Learning model-based strategies in simple environments with hierarchical
  q-networks
Learning model-based strategies in simple environments with hierarchical q-networks
Necati Alp Muyesser
Kyle Dunovan
Timothy D. Verstynen
21
1
0
20 Jan 2018
Índifference' methods for managing agent rewards
Índifference' methods for managing agent rewards
Stuart Armstrong
Xavier O'Rourke
11
19
0
18 Dec 2017
Occam's razor is insufficient to infer the preferences of irrational
  agents
Occam's razor is insufficient to infer the preferences of irrational agents
Stuart Armstrong
Sören Mindermann
19
92
0
15 Dec 2017
Good and safe uses of AI Oracles
Good and safe uses of AI Oracles
Stuart Armstrong
Xavier O'Rorke
32
26
0
15 Nov 2017
Safety Verification of Deep Neural Networks
Safety Verification of Deep Neural Networks
Xiaowei Huang
Marta Kwiatkowska
Sen Wang
Min Wu
AAML
183
933
0
21 Oct 2016
Safe Exploration in Markov Decision Processes
Safe Exploration in Markov Decision Processes
T. Moldovan
Pieter Abbeel
78
308
0
22 May 2012
Previous
123