ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1606.06565
  4. Cited By
Concrete Problems in AI Safety
v1v2 (latest)

Concrete Problems in AI Safety

21 June 2016
Dario Amodei
C. Olah
Jacob Steinhardt
Paul Christiano
John Schulman
Dandelion Mané
ArXiv (abs)PDFHTML

Papers citing "Concrete Problems in AI Safety"

50 / 1,374 papers shown
Title
FPR -- Fast Path Risk Algorithm to Evaluate Collision Probability
FPR -- Fast Path Risk Algorithm to Evaluate Collision Probability
A. Blake
Alejandro Bordallo
Kamen Brestnichki
Majd Hawasly
Svetlin Penkov
S. Ramamoorthy
Alexandre Silva
85
6
0
15 Apr 2018
Incomplete Contracting and AI Alignment
Incomplete Contracting and AI Alignment
Dylan Hadfield-Menell
Gillian Hadfield
170
97
0
12 Apr 2018
Toward Intelligent Vehicular Networks: A Machine Learning Framework
Toward Intelligent Vehicular Networks: A Machine Learning Framework
Le Liang
Hao Ye
Geoffrey Ye Li
130
220
0
01 Apr 2018
Deep k-Nearest Neighbors: Towards Confident, Interpretable and Robust
  Deep Learning
Deep k-Nearest Neighbors: Towards Confident, Interpretable and Robust Deep Learning
Nicolas Papernot
Patrick McDaniel
OODAAML
320
545
0
13 Mar 2018
The Surprising Creativity of Digital Evolution: A Collection of
  Anecdotes from the Evolutionary Computation and Artificial Life Research
  Communities
The Surprising Creativity of Digital Evolution: A Collection of Anecdotes from the Evolutionary Computation and Artificial Life Research Communities
Joel Lehman
Jeff Clune
D. Misevic
C. Adami
L. Altenberg
...
Danesh Tarapore
S. Thibault
Westley Weimer
R. Watson
Jason Yosinksi
413
298
0
09 Mar 2018
Predictive Uncertainty Estimation via Prior Networks
Predictive Uncertainty Estimation via Prior NetworksNeural Information Processing Systems (NeurIPS), 2018
A. Malinin
Mark Gales
UDBDLEDLUQCVPER
473
1,028
0
28 Feb 2018
Semantic Vector Spaces for Broadening Consideration of Consequences
Semantic Vector Spaces for Broadening Consideration of Consequences
Douglas Summers Stay
49
0
0
23 Feb 2018
The Malicious Use of Artificial Intelligence: Forecasting, Prevention,
  and Mitigation
The Malicious Use of Artificial Intelligence: Forecasting, Prevention, and Mitigation
Miles Brundage
S. Avin
Jack Clark
H. Toner
P. Eckersley
...
Owain Evans
Michael Page
Joanna J. Bryson
Roman V. Yampolskiy
Dario Amodei
187
810
0
20 Feb 2018
Learning Confidence for Out-of-Distribution Detection in Neural Networks
Learning Confidence for Out-of-Distribution Detection in Neural Networks
Terrance Devries
Graham W. Taylor
OODOODD
250
628
0
13 Feb 2018
Learning from Richer Human Guidance: Augmenting Comparison-Based
  Learning with Feature Queries
Learning from Richer Human Guidance: Augmenting Comparison-Based Learning with Feature Queries
Chandrayee Basu
M. Singhal
Anca Dragan
173
60
0
05 Feb 2018
A Cyber Science Based Ontology for Artificial General Intelligence
  Containment
A Cyber Science Based Ontology for Artificial General Intelligence Containment
Jason M. Pittman
Courtney E. Soboleski
188
3
0
28 Jan 2018
Safe Policy Improvement with Baseline Bootstrapping
Safe Policy Improvement with Baseline Bootstrapping
Romain Laroche
P. Trichelair
Rémi Tachet des Combes
OffRL
357
212
0
19 Dec 2017
Índifference' methods for managing agent rewards
Índifference' methods for managing agent rewards
Stuart Armstrong
Xavier O'Rourke
214
20
0
18 Dec 2017
Occam's razor is insufficient to infer the preferences of irrational
  agents
Occam's razor is insufficient to infer the preferences of irrational agents
Stuart Armstrong
Sören Mindermann
475
93
0
15 Dec 2017
A Low-Cost Ethics Shaping Approach for Designing Reinforcement Learning
  Agents
A Low-Cost Ethics Shaping Approach for Designing Reinforcement Learning Agents
Yueh-hua Wu
Shou-De Lin
OffRL
142
71
0
12 Dec 2017
AI Safety and Reproducibility: Establishing Robust Foundations for the
  Neuropsychology of Human Values
AI Safety and Reproducibility: Establishing Robust Foundations for the Neuropsychology of Human Values
G. Sarma
Nick J. Hay
A. Safron
99
13
0
08 Dec 2017
Safe Exploration for Identifying Linear Systems via Robust Optimization
Safe Exploration for Identifying Linear Systems via Robust Optimization
Tyler Lu
Martin A. Zinkevich
Craig Boutilier
Binz Roy
Dale Schuurmans
85
6
0
30 Nov 2017
AI Safety Gridworlds
AI Safety Gridworlds
Jan Leike
Miljan Martic
Victoria Krakovna
Pedro A. Ortega
Tom Everitt
Andrew Lefrancq
Laurent Orseau
Shane Legg
271
275
0
27 Nov 2017
Training Confidence-calibrated Classifiers for Detecting
  Out-of-Distribution Samples
Training Confidence-calibrated Classifiers for Detecting Out-of-Distribution Samples
Kimin Lee
Honglak Lee
Kibok Lee
Jinwoo Shin
OODD
420
921
0
26 Nov 2017
Ethical Challenges in Data-Driven Dialogue Systems
Ethical Challenges in Data-Driven Dialogue Systems
Peter Henderson
Koustuv Sinha
Nicolas Angelard-Gontier
Nan Rosemary Ke
G. Fried
Ryan J. Lowe
Joelle Pineau
153
183
0
24 Nov 2017
Safer Classification by Synthesis
Safer Classification by Synthesis
William Wang
Angelina Wang
Aviv Tamar
Xi Chen
Pieter Abbeel
164
41
0
22 Nov 2017
From Algorithmic Black Boxes to Adaptive White Boxes: Declarative
  Decision-Theoretic Ethical Programs as Codes of Ethics
From Algorithmic Black Boxes to Adaptive White Boxes: Declarative Decision-Theoretic Ethical Programs as Codes of Ethics
M. V. Otterlo
48
10
0
16 Nov 2017
Good and safe uses of AI Oracles
Good and safe uses of AI Oracles
Stuart Armstrong
Xavier O'Rorke
279
29
0
15 Nov 2017
Self-Regulating Artificial General Intelligence
Self-Regulating Artificial General Intelligence
J. Gans
88
9
0
12 Nov 2017
"Dave...I can assure you...that it's going to be all right..." -- A
  definition, case for, and survey of algorithmic assurances in human-autonomy
  trust relationships
"Dave...I can assure you...that it's going to be all right..." -- A definition, case for, and survey of algorithmic assurances in human-autonomy trust relationships
Brett W. Israelsen
Nisar R. Ahmed
256
94
0
08 Nov 2017
Inverse Reward Design
Inverse Reward Design
Dylan Hadfield-Menell
S. Milli
Pieter Abbeel
Stuart J. Russell
Anca Dragan
207
441
0
08 Nov 2017
Learning Robust Rewards with Adversarial Inverse Reinforcement Learning
Learning Robust Rewards with Adversarial Inverse Reinforcement Learning
Justin Fu
Katie Z Luo
Sergey Levine
317
811
0
30 Oct 2017
How Should a Robot Assess Risk? Towards an Axiomatic Theory of Risk in
  Robotics
How Should a Robot Assess Risk? Towards an Axiomatic Theory of Risk in Robotics
Anirudha Majumdar
Marco Pavone
223
211
0
30 Oct 2017
PixelDefend: Leveraging Generative Models to Understand and Defend
  against Adversarial Examples
PixelDefend: Leveraging Generative Models to Understand and Defend against Adversarial ExamplesInternational Conference on Learning Representations (ICLR), 2017
Yang Song
Taesup Kim
Sebastian Nowozin
Stefano Ermon
Nate Kushman
AAML
345
819
0
30 Oct 2017
Safety-Aware Apprenticeship Learning
Safety-Aware Apprenticeship LearningInternational Conference on Computer Aided Verification (CAV), 2017
Weichao Zhou
Wenchao Li
206
35
0
22 Oct 2017
Bayesian Hypernetworks
Bayesian Hypernetworks
David M. Krueger
Chin-Wei Huang
Riashat Islam
Ryan Turner
Alexandre Lacoste
Aaron Courville
UQCVBDL
169
143
0
13 Oct 2017
Distance-based Confidence Score for Neural Network Classifiers
Distance-based Confidence Score for Neural Network Classifiers
Amit Mandelbaum
D. Weinshall
UQCV
180
116
0
28 Sep 2017
A Policy Search Method For Temporal Logic Specified Reinforcement
  Learning Tasks
A Policy Search Method For Temporal Logic Specified Reinforcement Learning Tasks
Xiao Li
Yao Ma
C. Belta
153
61
0
27 Sep 2017
DropoutDAgger: A Bayesian Approach to Safe Imitation Learning
DropoutDAgger: A Bayesian Approach to Safe Imitation Learning
Kunal Menda
Katherine Driggs-Campbell
Mykel J. Kochenderfer
153
32
0
18 Sep 2017
An Analysis of ISO 26262: Using Machine Learning Safely in Automotive
  Software
An Analysis of ISO 26262: Using Machine Learning Safely in Automotive Software
Rick Salay
Rodrigo Queiroz
Krzysztof Czarnecki
137
142
0
07 Sep 2017
Uncertainty-Aware Learning from Demonstration using Mixture Density
  Networks with Sampling-Free Variance Modeling
Uncertainty-Aware Learning from Demonstration using Mixture Density Networks with Sampling-Free Variance Modeling
Sungjoon Choi
Kyungjae Lee
Sungbin Lim
Songhwai Oh
164
108
0
03 Sep 2017
On Ensuring that Intelligent Machines Are Well-Behaved
On Ensuring that Intelligent Machines Are Well-Behaved
Philip S. Thomas
Bruno C. da Silva
A. Barto
Emma Brunskill
FaML
116
16
0
17 Aug 2017
Attacking Automatic Video Analysis Algorithms: A Case Study of Google
  Cloud Video Intelligence API
Attacking Automatic Video Analysis Algorithms: A Case Study of Google Cloud Video Intelligence API
Hossein Hosseini
Baicen Xiao
Andrew Clark
Radha Poovendran
AAML
154
25
0
14 Aug 2017
Robust Computer Algebra, Theorem Proving, and Oracle AI
Robust Computer Algebra, Theorem Proving, and Oracle AI
G. Sarma
Nick J. Hay
113
4
0
08 Aug 2017
"I can assure you [$\ldots$] that it's going to be all right" -- A
  definition, case for, and survey of algorithmic assurances in human-autonomy
  trust relationships
"I can assure you […\ldots…] that it's going to be all right" -- A definition, case for, and survey of algorithmic assurances in human-autonomy trust relationships
Brett W. Israelsen
115
0
0
01 Aug 2017
Guidelines for Artificial Intelligence Containment
Guidelines for Artificial Intelligence Containment
James Babcock
János Kramár
Roman V. Yampolskiy
115
35
0
24 Jul 2017
Pragmatic-Pedagogic Value Alignment
Pragmatic-Pedagogic Value Alignment
J. F. Fisac
Monica A. Gates
Jessica B. Hamrick
Chang-rui Liu
Dylan Hadfield-Menell
Malayandi Palaniappan
Dhruv Malik
S. Shankar Sastry
Thomas Griffiths
Anca Dragan
165
87
0
20 Jul 2017
Trial without Error: Towards Safe Reinforcement Learning via Human
  Intervention
Trial without Error: Towards Safe Reinforcement Learning via Human Intervention
William Saunders
Girish Sastry
Andreas Stuhlmuller
Owain Evans
OffRL
201
255
0
17 Jul 2017
Efficient Probabilistic Performance Bounds for Inverse Reinforcement
  Learning
Efficient Probabilistic Performance Bounds for Inverse Reinforcement Learning
Daniel S. Brown
S. Niekum
BDLOffRL
234
45
0
03 Jul 2017
An In-Depth Analysis of Visual Tracking with Siamese Neural Networks
An In-Depth Analysis of Visual Tracking with Siamese Neural Networks
R. Pflugfelder
167
14
0
03 Jul 2017
Deep reinforcement learning from human preferences
Deep reinforcement learning from human preferencesNeural Information Processing Systems (NeurIPS), 2017
Paul Christiano
Jan Leike
Tom B. Brown
Miljan Martic
Shane Legg
Dario Amodei
1.3K
4,292
0
12 Jun 2017
Enhancing The Reliability of Out-of-distribution Image Detection in
  Neural Networks
Enhancing The Reliability of Out-of-distribution Image Detection in Neural NetworksInternational Conference on Learning Representations (ICLR), 2017
Shiyu Liang
Shouqing Yang
R. Srikant
UQCVOODD
1.0K
2,298
0
08 Jun 2017
Constrained Policy Optimization
Constrained Policy OptimizationInternational Conference on Machine Learning (ICML), 2017
Joshua Achiam
David Held
Aviv Tamar
Pieter Abbeel
867
1,540
0
30 May 2017
Safe Model-based Reinforcement Learning with Stability Guarantees
Safe Model-based Reinforcement Learning with Stability Guarantees
Felix Berkenkamp
M. Turchetta
Angela P. Schoellig
Andreas Krause
499
919
0
23 May 2017
Reinforcement Learning with a Corrupted Reward Channel
Reinforcement Learning with a Corrupted Reward Channel
Tom Everitt
Victoria Krakovna
Laurent Orseau
Marcus Hutter
Shane Legg
274
116
0
23 May 2017
Previous
123...262728
Next