ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1606.06565
  4. Cited By
Concrete Problems in AI Safety

Concrete Problems in AI Safety

21 June 2016
Dario Amodei
C. Olah
Jacob Steinhardt
Paul Christiano
John Schulman
Dandelion Mané
ArXivPDFHTML

Papers citing "Concrete Problems in AI Safety"

50 / 476 papers shown
Title
Scoring Rules for Performative Binary Prediction
Scoring Rules for Performative Binary Prediction
Alan Chan
31
1
0
05 Jul 2022
Shifts 2.0: Extending The Dataset of Real Distributional Shifts
Shifts 2.0: Extending The Dataset of Real Distributional Shifts
A. Malinin
A. Athanasopoulos
M. Barakovic
Meritxell Bach Cuadra
Mark Gales
...
Francesco La Rosa
Eli Sivena
V. Tsarsitalidis
Efi Tsompopoulou
E. Volf
OOD
30
28
0
30 Jun 2022
How to talk so AI will learn: Instructions, descriptions, and autonomy
How to talk so AI will learn: Instructions, descriptions, and autonomy
T. Sumers
Robert D. Hawkins
Mark K. Ho
Thomas Griffiths
Dylan Hadfield-Menell
LM&Ro
36
20
0
16 Jun 2022
Self-critiquing models for assisting human evaluators
Self-critiquing models for assisting human evaluators
William Saunders
Catherine Yeh
Jeff Wu
Steven Bills
Ouyang Long
Jonathan Ward
Jan Leike
ALM
ELM
29
282
0
12 Jun 2022
Density Regression and Uncertainty Quantification with Bayesian Deep
  Noise Neural Networks
Density Regression and Uncertainty Quantification with Bayesian Deep Noise Neural Networks
Daiwei Zhang
Tianci Liu
Jian Kang
BDL
UQCV
40
2
0
12 Jun 2022
Bayesian Active Learning for Scanning Probe Microscopy: from Gaussian
  Processes to Hypothesis Learning
Bayesian Active Learning for Scanning Probe Microscopy: from Gaussian Processes to Hypothesis Learning
M. Ziatdinov
Yongtao Liu
K. Kelley
Rama K Vasudevan
Sergei V. Kalinin
AI4CE
47
49
0
30 May 2022
Safety Aware Changepoint Detection for Piecewise i.i.d. Bandits
Safety Aware Changepoint Detection for Piecewise i.i.d. Bandits
Subhojyoti Mukherjee
21
1
0
27 May 2022
Eye-gaze-guided Vision Transformer for Rectifying Shortcut Learning
Eye-gaze-guided Vision Transformer for Rectifying Shortcut Learning
Chong Ma
Lin Zhao
Yuzhong Chen
Lu Zhang
Zhe Xiao
...
Tuo Zhang
Qian Wang
Dinggang Shen
Dajiang Zhu
Tianming Liu
ViT
MedIm
42
30
0
25 May 2022
Non-Programmers Can Label Programs Indirectly via Active Examples: A
  Case Study with Text-to-SQL
Non-Programmers Can Label Programs Indirectly via Active Examples: A Case Study with Text-to-SQL
Ruiqi Zhong
Charles Burton Snell
Dan Klein
Jason Eisner
24
8
0
25 May 2022
Reward Uncertainty for Exploration in Preference-based Reinforcement
  Learning
Reward Uncertainty for Exploration in Preference-based Reinforcement Learning
Xinran Liang
Katherine Shu
Kimin Lee
Pieter Abbeel
21
58
0
24 May 2022
Learning Stabilizing Policies in Stochastic Control Systems
Learning Stabilizing Policies in Stochastic Control Systems
Dorde Zikelic
Mathias Lechner
K. Chatterjee
T. Henzinger
29
3
0
24 May 2022
A Review of Safe Reinforcement Learning: Methods, Theory and
  Applications
A Review of Safe Reinforcement Learning: Methods, Theory and Applications
Shangding Gu
Longyu Yang
Yali Du
Guang Chen
Florian Walter
Jun Wang
Alois C. Knoll
OffRL
AI4TS
117
241
0
20 May 2022
A Generalist Agent
A Generalist Agent
Scott E. Reed
Konrad Zolna
Emilio Parisotto
Sergio Gomez Colmenarejo
Alexander Novikov
...
Yutian Chen
R. Hadsell
Oriol Vinyals
Mahyar Bordbar
Nando de Freitas
LM&Ro
LLMAG
AI4CE
95
793
0
12 May 2022
Norm-Scaling for Out-of-Distribution Detection
Norm-Scaling for Out-of-Distribution Detection
Deepak Ravikumar
Kaushik Roy
OODD
UQCV
24
2
0
06 May 2022
Adversarial Training for High-Stakes Reliability
Adversarial Training for High-Stakes Reliability
Daniel M. Ziegler
Seraphina Nix
Lawrence Chan
Tim Bauman
Peter Schmidt-Nielsen
...
Noa Nabeshima
Benjamin Weinstein-Raun
D. Haas
Buck Shlegeris
Nate Thomas
AAML
38
59
0
03 May 2022
A Simple Approach to Improve Single-Model Deep Uncertainty via
  Distance-Awareness
A Simple Approach to Improve Single-Model Deep Uncertainty via Distance-Awareness
J. Liu
Shreyas Padhy
Jie Jessie Ren
Zi Lin
Yeming Wen
Ghassen Jerfel
Zachary Nado
Jasper Snoek
Dustin Tran
Balaji Lakshminarayanan
UQCV
BDL
26
48
0
01 May 2022
Counterfactual harm
Counterfactual harm
Jonathan G. Richens
R. Beard
Daniel H. Thompson
31
27
0
27 Apr 2022
Uncertainty-Aware Prediction of Battery Energy Consumption for Hybrid
  Electric Vehicles
Uncertainty-Aware Prediction of Battery Energy Consumption for Hybrid Electric Vehicles
Jihed Khiari
Cristina Olaverri-Monreal
27
2
0
27 Apr 2022
Can Foundation Models Perform Zero-Shot Task Specification For Robot
  Manipulation?
Can Foundation Models Perform Zero-Shot Task Specification For Robot Manipulation?
Yuchen Cui
S. Niekum
Abhi Gupta
Vikash Kumar
Aravind Rajeswaran
LM&Ro
30
74
0
23 Apr 2022
Certifiable Robot Design Optimization using Differentiable Programming
Certifiable Robot Design Optimization using Differentiable Programming
Charles Dawson
Chuchu Fan
25
7
0
22 Apr 2022
Training a Helpful and Harmless Assistant with Reinforcement Learning
  from Human Feedback
Training a Helpful and Harmless Assistant with Reinforcement Learning from Human Feedback
Yuntao Bai
Andy Jones
Kamal Ndousse
Amanda Askell
Anna Chen
...
Jack Clark
Sam McCandlish
C. Olah
Benjamin Mann
Jared Kaplan
95
2,352
0
12 Apr 2022
Effective Out-of-Distribution Detection in Classifier Based on
  PEDCC-Loss
Effective Out-of-Distribution Detection in Classifier Based on PEDCC-Loss
Qiuyu Zhu
Guohui Zheng
Yingying Yan
OODD
22
8
0
10 Apr 2022
Learning Confidence for Transformer-based Neural Machine Translation
Learning Confidence for Transformer-based Neural Machine Translation
Yu Lu
Jiali Zeng
Jiajun Zhang
Shuangzhi Wu
Mu Li
41
9
0
22 Mar 2022
Human-Centric Artificial Intelligence Architecture for Industry 5.0
  Applications
Human-Centric Artificial Intelligence Architecture for Industry 5.0 Applications
Jovze M. Rovzanec
I. Novalija
Patrik Zajec
K. Kenda
Hooman Tavakoli
...
G. Sofianidis
Spyros Theodoropoulos
Blavz Fortuna
Dunja Mladenić
John Soldatos
3DV
AI4CE
38
121
0
21 Mar 2022
Leveraging Adversarial Examples to Quantify Membership Information
  Leakage
Leveraging Adversarial Examples to Quantify Membership Information Leakage
Ganesh Del Grosso
Hamid Jalalzai
Georg Pichler
C. Palamidessi
Pablo Piantanida
MIACV
36
21
0
17 Mar 2022
Evaluating Object (mis)Detection from a Safety and Reliability
  Perspective: Discussion and Measures
Evaluating Object (mis)Detection from a Safety and Reliability Perspective: Discussion and Measures
Andrea Ceccarelli
Leonardo Montecchi
27
10
0
04 Mar 2022
Finding Safe Zones of policies Markov Decision Processes
Finding Safe Zones of policies Markov Decision Processes
Lee Cohen
Yishay Mansour
Michal Moshkovitz
27
1
0
23 Feb 2022
Learning Behavioral Soft Constraints from Demonstrations
Learning Behavioral Soft Constraints from Demonstrations
Arie Glazier
Andrea Loreggia
N. Mattei
Taher Rahgooy
F. Rossi
Brent Venable
21
5
0
21 Feb 2022
System Safety and Artificial Intelligence
System Safety and Artificial Intelligence
Roel Dobbe
25
34
0
18 Feb 2022
Strategy Discovery and Mixture in Lifelong Learning from Heterogeneous
  Demonstration
Strategy Discovery and Mixture in Lifelong Learning from Heterogeneous Demonstration
Sravan Jayanthi
Letian Chen
Matthew C. Gombolay
30
0
0
14 Feb 2022
Red Teaming Language Models with Language Models
Red Teaming Language Models with Language Models
Ethan Perez
Saffron Huang
Francis Song
Trevor Cai
Roman Ring
John Aslanides
Amelia Glaese
Nat McAleese
G. Irving
AAML
13
611
0
07 Feb 2022
Describing Differences between Text Distributions with Natural Language
Describing Differences between Text Distributions with Natural Language
Ruiqi Zhong
Charles Burton Snell
Dan Klein
Jacob Steinhardt
VLM
132
42
0
28 Jan 2022
A causal model of safety assurance for machine learning
A causal model of safety assurance for machine learning
Simon Burton
CML
32
5
0
14 Jan 2022
An Abstraction-Refinement Approach to Verifying Convolutional Neural
  Networks
An Abstraction-Refinement Approach to Verifying Convolutional Neural Networks
Matan Ostrovsky
Clark W. Barrett
Guy Katz
40
26
0
06 Jan 2022
Out-of-distribution Detection with Boundary Aware Learning
Out-of-distribution Detection with Boundary Aware Learning
Sen Pei
Xin Zhang
Bin Fan
Gaofeng Meng
OODD
21
8
0
22 Dec 2021
Safety-Aware Preference-Based Learning for Safety-Critical Control
Safety-Aware Preference-Based Learning for Safety-Critical Control
Ryan K. Cosner
Maegan Tucker
Andrew J. Taylor
Kejun Li
Tamás G. Molnár
Wyatt Ubellacker
Anil Alan
G. Orosz
Yisong Yue
Aaron D. Ames
31
24
0
15 Dec 2021
Quantifying Multimodality in World Models
Quantifying Multimodality in World Models
Andreas Sedlmeier
Michael Kölle
Robert Muller
Leo Baudrexel
Claudia Linnhoff-Popien
OffRL
22
1
0
14 Dec 2021
Programmatic Reward Design by Example
Programmatic Reward Design by Example
Weichao Zhou
Wenchao Li
34
15
0
14 Dec 2021
Causal-based Time Series Domain Generalization for Vehicle Intention
  Prediction
Causal-based Time Series Domain Generalization for Vehicle Intention Prediction
Yeping Hu
Xiaogang Jia
Masayoshi Tomizuka
Wei Zhan
OOD
40
25
0
03 Dec 2021
Learning Optimal Predictive Checklists
Learning Optimal Predictive Checklists
Haoran Zhang
Q. Morris
Berk Ustun
Marzyeh Ghassemi
26
11
0
02 Dec 2021
Reward-Free Attacks in Multi-Agent Reinforcement Learning
Reward-Free Attacks in Multi-Agent Reinforcement Learning
Ted Fujimoto
T. Doster
A. Attarian
Jill M. Brandenberger
Nathan Oken Hodas
AAML
24
4
0
02 Dec 2021
Data Invariants to Understand Unsupervised Out-of-Distribution Detection
Data Invariants to Understand Unsupervised Out-of-Distribution Detection
Lars Doorenbos
Raphael Sznitman
Pablo Márquez-Neila
OODD
27
6
0
26 Nov 2021
Scalar reward is not enough: A response to Silver, Singh, Precup and
  Sutton (2021)
Scalar reward is not enough: A response to Silver, Singh, Precup and Sutton (2021)
Peter Vamplew
Benjamin J. Smith
Johan Källström
G. Ramos
Roxana Rădulescu
...
Fredrik Heintz
Patrick Mannion
Pieter J. K. Libin
Richard Dazeley
Cameron Foale
LRM
29
66
0
25 Nov 2021
Hierarchical Graph-Convolutional Variational AutoEncoding for Generative
  Modelling of Human Motion
Hierarchical Graph-Convolutional Variational AutoEncoding for Generative Modelling of Human Motion
Anthony Bourached
Robert J. Gray
Xiaodong Guan
Ryan-Rhys Griffiths
A. Jha
P. Nachev
3DH
DRL
14
1
0
24 Nov 2021
A Survey on AI Assurance
A Survey on AI Assurance
Feras A. Batarseh
Laura J. Freeman
31
65
0
15 Nov 2021
RLOps: Development Life-cycle of Reinforcement Learning Aided Open RAN
RLOps: Development Life-cycle of Reinforcement Learning Aided Open RAN
Peizheng Li
Jonathan D. Thomas
Xiaoyang Wang
Ahmed Khalil
A. Ahmad
...
S. Kapoor
Arjun Parekh
A. Doufexi
Arman Shojaeifard
Robert Piechocki
AI4TS
14
37
0
12 Nov 2021
Statistical Perspectives on Reliability of Artificial Intelligence
  Systems
Statistical Perspectives on Reliability of Artificial Intelligence Systems
Yili Hong
J. Lian
Li Xu
Jie Min
Yueyao Wang
Laura J. Freeman
Xinwei Deng
35
30
0
09 Nov 2021
Evaluating Predictive Uncertainty and Robustness to Distributional Shift
  Using Real World Data
Evaluating Predictive Uncertainty and Robustness to Distributional Shift Using Real World Data
Kumud Lakara
Akshat Bhandari
Pratinav Seth
Ujjwal Verma
OOD
29
3
0
08 Nov 2021
B-Pref: Benchmarking Preference-Based Reinforcement Learning
B-Pref: Benchmarking Preference-Based Reinforcement Learning
Kimin Lee
Laura M. Smith
Anca Dragan
Pieter Abbeel
OffRL
40
93
0
04 Nov 2021
Model-Free Risk-Sensitive Reinforcement Learning
Model-Free Risk-Sensitive Reinforcement Learning
Grégoire Delétang
Jordi Grau-Moya
M. Kunesch
Tim Genewein
Rob Brekelmans
Shane Legg
Pedro A. Ortega
OOD
10
9
0
04 Nov 2021
Previous
123456...8910
Next