v1v2 (latest)

Concrete Problems in AI Safety

21 June 2016

Papers citing "Concrete Problems in AI Safety"

50 / 1,374 papers shown

Title
FPR -- Fast Path Risk Algorithm to Evaluate Collision Probability A. Blake Alejandro Bordallo Kamen Brestnichki Majd Hawasly Svetlin Penkov S. Ramamoorthy Alexandre Silva 85 6 0 15 Apr 2018
Incomplete Contracting and AI Alignment Dylan Hadfield-Menell Gillian Hadfield 170 97 0 12 Apr 2018
Toward Intelligent Vehicular Networks: A Machine Learning Framework Le Liang Hao Ye Geoffrey Ye Li 130 220 0 01 Apr 2018
Deep k-Nearest Neighbors: Towards Confident, Interpretable and Robust Deep Learning Nicolas Papernot Patrick McDaniel OOD AAML 320 545 0 13 Mar 2018
The Surprising Creativity of Digital Evolution: A Collection of Anecdotes from the Evolutionary Computation and Artificial Life Research Communities Joel Lehman Jeff Clune D. Misevic C. Adami L. Altenberg ... Danesh Tarapore S. Thibault Westley Weimer R. Watson Jason Yosinksi 413 298 0 09 Mar 2018
Predictive Uncertainty Estimation via Prior NetworksNeural Information Processing Systems (NeurIPS), 2018 A. Malinin Mark Gales UD BDL EDL UQCV PER 473 1,028 0 28 Feb 2018
Semantic Vector Spaces for Broadening Consideration of Consequences Douglas Summers Stay 49 0 0 23 Feb 2018
The Malicious Use of Artificial Intelligence: Forecasting, Prevention, and Mitigation Miles Brundage S. Avin Jack Clark H. Toner P. Eckersley ... Owain Evans Michael Page Joanna J. Bryson Roman V. Yampolskiy Dario Amodei 187 810 0 20 Feb 2018
Learning Confidence for Out-of-Distribution Detection in Neural Networks Terrance Devries Graham W. Taylor OOD OODD 250 628 0 13 Feb 2018
Learning from Richer Human Guidance: Augmenting Comparison-Based Learning with Feature Queries Chandrayee Basu M. Singhal Anca Dragan 173 60 0 05 Feb 2018
A Cyber Science Based Ontology for Artificial General Intelligence Containment Jason M. Pittman Courtney E. Soboleski 188 3 0 28 Jan 2018
Safe Policy Improvement with Baseline Bootstrapping Romain Laroche P. Trichelair Rémi Tachet des Combes OffRL 357 212 0 19 Dec 2017
Índifference' methods for managing agent rewards Stuart Armstrong Xavier O'Rourke 214 20 0 18 Dec 2017
Occam's razor is insufficient to infer the preferences of irrational agents Stuart Armstrong Sören Mindermann 475 93 0 15 Dec 2017
A Low-Cost Ethics Shaping Approach for Designing Reinforcement Learning Agents Yueh-hua Wu Shou-De Lin OffRL 142 71 0 12 Dec 2017
AI Safety and Reproducibility: Establishing Robust Foundations for the Neuropsychology of Human Values G. Sarma Nick J. Hay A. Safron 99 13 0 08 Dec 2017
Safe Exploration for Identifying Linear Systems via Robust Optimization Tyler Lu Martin A. Zinkevich Craig Boutilier Binz Roy Dale Schuurmans 85 6 0 30 Nov 2017
AI Safety Gridworlds Jan Leike Miljan Martic Victoria Krakovna Pedro A. Ortega Tom Everitt Andrew Lefrancq Laurent Orseau Shane Legg 271 275 0 27 Nov 2017
Training Confidence-calibrated Classifiers for Detecting Out-of-Distribution Samples Kimin Lee Honglak Lee Kibok Lee Jinwoo Shin OODD 420 921 0 26 Nov 2017
Ethical Challenges in Data-Driven Dialogue Systems Peter Henderson Koustuv Sinha Nicolas Angelard-Gontier Nan Rosemary Ke G. Fried Ryan J. Lowe Joelle Pineau 153 183 0 24 Nov 2017
Safer Classification by Synthesis William Wang Angelina Wang Aviv Tamar Xi Chen Pieter Abbeel 164 41 0 22 Nov 2017
From Algorithmic Black Boxes to Adaptive White Boxes: Declarative Decision-Theoretic Ethical Programs as Codes of Ethics M. V. Otterlo 48 10 0 16 Nov 2017
Good and safe uses of AI Oracles Stuart Armstrong Xavier O'Rorke 279 29 0 15 Nov 2017
Self-Regulating Artificial General Intelligence J. Gans 88 9 0 12 Nov 2017
"Dave...I can assure you...that it's going to be all right..." -- A definition, case for, and survey of algorithmic assurances in human-autonomy trust relationships Brett W. Israelsen Nisar R. Ahmed 256 94 0 08 Nov 2017
Inverse Reward Design Dylan Hadfield-Menell S. Milli Pieter Abbeel Stuart J. Russell Anca Dragan 207 441 0 08 Nov 2017
Learning Robust Rewards with Adversarial Inverse Reinforcement Learning Justin Fu Katie Z Luo Sergey Levine 317 811 0 30 Oct 2017
How Should a Robot Assess Risk? Towards an Axiomatic Theory of Risk in Robotics Anirudha Majumdar Marco Pavone 223 211 0 30 Oct 2017
PixelDefend: Leveraging Generative Models to Understand and Defend against Adversarial ExamplesInternational Conference on Learning Representations (ICLR), 2017 Yang Song Taesup Kim Sebastian Nowozin Stefano Ermon Nate Kushman AAML 345 819 0 30 Oct 2017
Safety-Aware Apprenticeship LearningInternational Conference on Computer Aided Verification (CAV), 2017 Weichao Zhou Wenchao Li 206 35 0 22 Oct 2017
Bayesian Hypernetworks David M. Krueger Chin-Wei Huang Riashat Islam Ryan Turner Alexandre Lacoste Aaron Courville UQCV BDL 169 143 0 13 Oct 2017
Distance-based Confidence Score for Neural Network Classifiers Amit Mandelbaum D. Weinshall UQCV 180 116 0 28 Sep 2017
A Policy Search Method For Temporal Logic Specified Reinforcement Learning Tasks Xiao Li Yao Ma C. Belta 153 61 0 27 Sep 2017
DropoutDAgger: A Bayesian Approach to Safe Imitation Learning Kunal Menda Katherine Driggs-Campbell Mykel J. Kochenderfer 153 32 0 18 Sep 2017
An Analysis of ISO 26262: Using Machine Learning Safely in Automotive Software Rick Salay Rodrigo Queiroz Krzysztof Czarnecki 137 142 0 07 Sep 2017
Uncertainty-Aware Learning from Demonstration using Mixture Density Networks with Sampling-Free Variance Modeling Sungjoon Choi Kyungjae Lee Sungbin Lim Songhwai Oh 164 108 0 03 Sep 2017
On Ensuring that Intelligent Machines Are Well-Behaved Philip S. Thomas Bruno C. da Silva A. Barto Emma Brunskill FaML 116 16 0 17 Aug 2017
Attacking Automatic Video Analysis Algorithms: A Case Study of Google Cloud Video Intelligence API Hossein Hosseini Baicen Xiao Andrew Clark Radha Poovendran AAML 154 25 0 14 Aug 2017
Robust Computer Algebra, Theorem Proving, and Oracle AI G. Sarma Nick J. Hay 113 4 0 08 Aug 2017
$"I can assure you [$\ldots$] that it's going to be all right" -- A definition, case for, and survey of algorithmic assurances in human-autonomy trust relationships$ "I can assure you [ $\ldots$ ] that it's going to be all right" -- A definition, case for, and survey of algorithmic assurances in human-autonomy trust relationships Brett W. Israelsen 115 0 0 01 Aug 2017
Guidelines for Artificial Intelligence Containment James Babcock János Kramár Roman V. Yampolskiy 115 35 0 24 Jul 2017
Pragmatic-Pedagogic Value Alignment J. F. Fisac Monica A. Gates Jessica B. Hamrick Chang-rui Liu Dylan Hadfield-Menell Malayandi Palaniappan Dhruv Malik S. Shankar Sastry Thomas Griffiths Anca Dragan 165 87 0 20 Jul 2017
Trial without Error: Towards Safe Reinforcement Learning via Human Intervention William Saunders Girish Sastry Andreas Stuhlmuller Owain Evans OffRL 201 255 0 17 Jul 2017
Efficient Probabilistic Performance Bounds for Inverse Reinforcement Learning Daniel S. Brown S. Niekum BDL OffRL 234 45 0 03 Jul 2017
An In-Depth Analysis of Visual Tracking with Siamese Neural Networks R. Pflugfelder 167 14 0 03 Jul 2017
Deep reinforcement learning from human preferencesNeural Information Processing Systems (NeurIPS), 2017 Paul Christiano Jan Leike Tom B. Brown Miljan Martic Shane Legg Dario Amodei 1.3K 4,292 0 12 Jun 2017
Enhancing The Reliability of Out-of-distribution Image Detection in Neural NetworksInternational Conference on Learning Representations (ICLR), 2017 Shiyu Liang Shouqing Yang R. Srikant UQCV OODD 1.0K 2,298 0 08 Jun 2017
Constrained Policy OptimizationInternational Conference on Machine Learning (ICML), 2017 Joshua Achiam David Held Aviv Tamar Pieter Abbeel 867 1,540 0 30 May 2017
Safe Model-based Reinforcement Learning with Stability Guarantees Felix Berkenkamp M. Turchetta Angela P. Schoellig Andreas Krause 499 919 0 23 May 2017
Reinforcement Learning with a Corrupted Reward Channel Tom Everitt Victoria Krakovna Laurent Orseau Marcus Hutter Shane Legg 274 116 0 23 May 2017