ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1606.06565
  4. Cited By
Concrete Problems in AI Safety
v1v2 (latest)

Concrete Problems in AI Safety

21 June 2016
Dario Amodei
C. Olah
Jacob Steinhardt
Paul Christiano
John Schulman
Dandelion Mané
ArXiv (abs)PDFHTML

Papers citing "Concrete Problems in AI Safety"

50 / 1,371 papers shown
Title
Towards Distribution-Agnostic Generalized Category Discovery
Towards Distribution-Agnostic Generalized Category DiscoveryNeural Information Processing Systems (NeurIPS), 2023
Jianhong Bai
Zuo-Qiang Liu
Hualiang Wang
Ruizhe Chen
Lianrui Mu
Xiaomeng Li
Qiufeng Wang
Yang Feng
Jian Wu
Haoji Hu
332
18
0
02 Oct 2023
Unified Uncertainty Calibration
Unified Uncertainty Calibration
Kamalika Chaudhuri
David Lopez-Paz
258
0
0
02 Oct 2023
Faithful Explanations of Black-box NLP Models Using LLM-generated
  Counterfactuals
Faithful Explanations of Black-box NLP Models Using LLM-generated CounterfactualsInternational Conference on Learning Representations (ICLR), 2023
Y. Gat
Nitay Calderon
Amir Feder
Alexander Chapanin
Amit Sharma
Roi Reichart
320
42
0
01 Oct 2023
Intrinsic Language-Guided Exploration for Complex Long-Horizon Robotic
  Manipulation Tasks
Intrinsic Language-Guided Exploration for Complex Long-Horizon Robotic Manipulation TasksIEEE International Conference on Robotics and Automation (ICRA), 2023
Wenke Huang
Filippos Christianos
Zhibin Li
211
16
0
28 Sep 2023
Beyond Reverse KL: Generalizing Direct Preference Optimization with
  Diverse Divergence Constraints
Beyond Reverse KL: Generalizing Direct Preference Optimization with Diverse Divergence ConstraintsInternational Conference on Learning Representations (ICLR), 2023
Simon Mahns
Yibo Jiang
Yuguang Yang
Han Liu
Yuxin Chen
182
141
0
28 Sep 2023
CoinRun: Solving Goal Misgeneralisation
CoinRun: Solving Goal Misgeneralisation
Stuart Armstrong
Alexandre Maranhao
Oliver Daniels-Koch
Ioannis Gkioulekas
Rebecca Gormann
LRM
125
0
0
28 Sep 2023
Large Language Model Alignment: A Survey
Large Language Model Alignment: A Survey
Shangda Wu
Renren Jin
Yufei Huang
Chuang Liu
Weilong Dong
Zishan Guo
Xinwei Wu
Yan Liu
Deyi Xiong
LM&MA
288
271
0
26 Sep 2023
Hierarchical Imitation Learning for Stochastic Environments
Hierarchical Imitation Learning for Stochastic EnvironmentsIEEE/RJS International Conference on Intelligent RObots and Systems (IROS), 2023
Maximilian Igl
Punit Shah
Paul Mougin
S. Srinivasan
Tarun Gupta
Brandyn White
K. Shiarlis
Shimon Whiteson
OOD
149
3
0
25 Sep 2023
LMC: Large Model Collaboration with Cross-assessment for Training-Free
  Open-Set Object Recognition
LMC: Large Model Collaboration with Cross-assessment for Training-Free Open-Set Object RecognitionNeural Information Processing Systems (NeurIPS), 2023
Haoxuan Qu
Xiaofei Hui
Yujun Cai
Jun Liu
522
16
0
22 Sep 2023
Learning to Recover for Safe Reinforcement Learning
Learning to Recover for Safe Reinforcement Learning
Haoyu Wang
Xin Yuan
Qinqing Ren
148
0
0
21 Sep 2023
You can have your ensemble and run it too -- Deep Ensembles Spread Over
  Time
You can have your ensemble and run it too -- Deep Ensembles Spread Over Time
Isak Meding
Alexander Bodin
Adam Tonderski
Joakim Johnander
Christoffer Petersson
Lennart Svensson
OODUQCV
103
1
0
20 Sep 2023
Probabilistic Safety Regions Via Finite Families of Scalable Classifiers
Probabilistic Safety Regions Via Finite Families of Scalable Classifiers
Alberto Carlevaro
Teodoro Alamo
Fabrizio Dabbene
Maurizio Mongelli
132
2
0
08 Sep 2023
Learning Active Subspaces for Effective and Scalable Uncertainty
  Quantification in Deep Neural Networks
Learning Active Subspaces for Effective and Scalable Uncertainty Quantification in Deep Neural NetworksIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2023
Sanket Jantre
Nathan M. Urban
Xiaoning Qian
Byung-Jun Yoon
BDLUQCV
186
9
0
06 Sep 2023
On Reducing Undesirable Behavior in Deep Reinforcement Learning Models
On Reducing Undesirable Behavior in Deep Reinforcement Learning Models
Ophir M. Carmel
Guy Katz
195
0
0
06 Sep 2023
Iterative Reward Shaping using Human Feedback for Correcting Reward
  Misspecification
Iterative Reward Shaping using Human Feedback for Correcting Reward MisspecificationEuropean Conference on Artificial Intelligence (ECAI), 2023
Jasmina Gajcin
J. McCarthy
Rahul Nair
Radu Marinescu
Elizabeth M. Daly
Ivana Dusparic
196
6
0
30 Aug 2023
RecRec: Algorithmic Recourse for Recommender Systems
RecRec: Algorithmic Recourse for Recommender SystemsInternational Conference on Information and Knowledge Management (CIKM), 2023
Sahil Verma
Ashudeep Singh
Varich Boonsanong
John P. Dickerson
Chirag Shah
216
5
0
28 Aug 2023
The Promise and Peril of Artificial Intelligence -- Violet Teaming
  Offers a Balanced Path Forward
The Promise and Peril of Artificial Intelligence -- Violet Teaming Offers a Balanced Path Forward
A. Titus
Adam Russell
226
4
0
28 Aug 2023
Queering the ethics of AI
Queering the ethics of AI
E. Fosch-Villaronga
Gianclaudio Malgieri
141
2
0
25 Aug 2023
Language Reward Modulation for Pretraining Reinforcement Learning
Language Reward Modulation for Pretraining Reinforcement Learning
Ademi Adeniji
Amber Xie
Carmelo Sferrazza
Younggyo Seo
Stephen James
Pieter Abbeel
169
39
0
23 Aug 2023
Trustworthy Representation Learning Across Domains
Trustworthy Representation Learning Across Domains
Ronghang Zhu
Dongliang Guo
Daiqing Qi
Zhixuan Chu
Xiang Yu
Sheng Li
FaMLAI4TS
237
2
0
23 Aug 2023
AutoGen: Enabling Next-Gen LLM Applications via Multi-Agent Conversation
AutoGen: Enabling Next-Gen LLM Applications via Multi-Agent Conversation
Qingyun Wu
Gagan Bansal
Jieyu Zhang
Yiran Wu
Beibin Li
...
Jiale Liu
Ahmed Hassan Awadallah
Ryen W. White
Doug Burger
Chi Wang
LLMAGAI4CE
283
829
0
16 Aug 2023
Generating Personas for Games with Multimodal Adversarial Imitation
  Learning
Generating Personas for Games with Multimodal Adversarial Imitation Learning
William Ahlberg
Alessandro Sestini
Konrad Tollmar
Linus Gisslén
GAN
117
8
0
15 Aug 2023
Simple synthetic data reduces sycophancy in large language models
Simple synthetic data reduces sycophancy in large language models
Jerry W. Wei
Da Huang
Yifeng Lu
Denny Zhou
Quoc V. Le
394
93
0
07 Aug 2023
Empirical Optimal Risk to Quantify Model Trustworthiness for Failure
  Detection
Empirical Optimal Risk to Quantify Model Trustworthiness for Failure Detection
Shuang Ao
Stefan Rueger
Advaith Siddharthan
149
4
0
06 Aug 2023
Machine Learning for Infectious Disease Risk Prediction: A Survey
Machine Learning for Infectious Disease Risk Prediction: A SurveyACM Computing Surveys (ACM Comput. Surv.), 2023
Mutong Liu
Yang Liu
Jiming Liu
LM&MAAI4CE
121
6
0
06 Aug 2023
A Case for AI Safety via Law
A Case for AI Safety via Law
Jeffrey W. Johnston
206
1
0
31 Jul 2023
Rating-based Reinforcement Learning
Rating-based Reinforcement LearningAAAI Conference on Artificial Intelligence (AAAI), 2023
Devin White
Mingkang Wu
Ellen R. Novoseller
Vernon J. Lawhern
Nicholas R. Waytowich
Yongcan Cao
ALM
202
12
0
30 Jul 2023
Open Problems and Fundamental Limitations of Reinforcement Learning from
  Human Feedback
Open Problems and Fundamental Limitations of Reinforcement Learning from Human Feedback
Stephen Casper
Xander Davies
Claudia Shi
T. Gilbert
Jérémy Scheurer
...
Erdem Biyik
Anca Dragan
David M. Krueger
Dorsa Sadigh
Dylan Hadfield-Menell
ALMOffRL
301
686
0
27 Jul 2023
Designing Fiduciary Artificial Intelligence
Designing Fiduciary Artificial IntelligenceConference on Equity and Access in Algorithms, Mechanisms, and Optimization (EAAMO), 2023
Sebastian Benthall
David Shekman
140
8
0
27 Jul 2023
Approximate Model-Based Shielding for Safe Reinforcement Learning
Approximate Model-Based Shielding for Safe Reinforcement LearningEuropean Conference on Artificial Intelligence (ECAI), 2023
Alexander W. Goodall
Francesco Belardinelli
143
3
0
27 Jul 2023
Evaluating the Moral Beliefs Encoded in LLMs
Evaluating the Moral Beliefs Encoded in LLMsNeural Information Processing Systems (NeurIPS), 2023
Nino Scherrer
Claudia Shi
Amir Feder
David M. Blei
207
191
0
26 Jul 2023
Of Models and Tin Men: A Behavioural Economics Study of Principal-Agent
  Problems in AI Alignment using Large-Language Models
Of Models and Tin Men: A Behavioural Economics Study of Principal-Agent Problems in AI Alignment using Large-Language Models
S. Phelps
Rebecca E. Ranson
LLMAG
172
6
0
20 Jul 2023
Absolutist AI
Absolutist AI
Mitchell Barrington
86
0
0
19 Jul 2023
STRAPPER: Preference-based Reinforcement Learning via Self-training
  Augmentation and Peer Regularization
STRAPPER: Preference-based Reinforcement Learning via Self-training Augmentation and Peer Regularization
Yachen Kang
Li He
Jinxin Liu
Zifeng Zhuang
Xuetao Zhang
312
1
0
19 Jul 2023
Classical Out-of-Distribution Detection Methods Benchmark in Text
  Classification Tasks
Classical Out-of-Distribution Detection Methods Benchmark in Text Classification TasksAnnual Meeting of the Association for Computational Linguistics (ACL), 2023
M. Baran
Joanna Baran
Mateusz Wójcik
Maciej Ziȩba
Adam Gonczarek
OODD
279
8
0
13 Jul 2023
Diagnosis, Feedback, Adaptation: A Human-in-the-Loop Framework for
  Test-Time Policy Adaptation
Diagnosis, Feedback, Adaptation: A Human-in-the-Loop Framework for Test-Time Policy AdaptationInternational Conference on Machine Learning (ICML), 2023
Andi Peng
Aviv Netanyahu
Mark K. Ho
Tianmin Shu
Andreea Bobu
J. Shah
Pulkit Agrawal
195
17
0
12 Jul 2023
Frontier AI Regulation: Managing Emerging Risks to Public Safety
Frontier AI Regulation: Managing Emerging Risks to Public Safety
Markus Anderljung
Joslyn Barnhart
Anton Korinek
Jade Leung
Cullen O'Keefe
...
Jonas Schuett
Yonadav Shavit
Divya Siddarth
Robert F. Trager
Kevin J. Wolf
SILM
297
150
0
06 Jul 2023
FOCUS: Object-Centric World Models for Robotics Manipulation
FOCUS: Object-Centric World Models for Robotics Manipulation
Stefano Ferraro
Pietro Mazzaglia
Tim Verbelen
Bart Dhoedt
OCLLM&Ro
232
17
0
05 Jul 2023
Trainable Transformer in Transformer
Trainable Transformer in TransformerInternational Conference on Machine Learning (ICML), 2023
A. Panigrahi
Sadhika Malladi
Mengzhou Xia
Sanjeev Arora
VLM
292
13
0
03 Jul 2023
Empirically Validating Conformal Prediction on Modern Vision
  Architectures Under Distribution Shift and Long-tailed Data
Empirically Validating Conformal Prediction on Modern Vision Architectures Under Distribution Shift and Long-tailed Data
Kevin Kasa
Graham W. Taylor
260
7
0
03 Jul 2023
Morse Neural Networks for Uncertainty Quantification
Morse Neural Networks for Uncertainty Quantification
Benoit Dherin
Huiyi Hu
Jie Jessie Ren
Michael W. Dusenberry
Balaji Lakshminarayanan
UQCVAI4CE
129
5
0
02 Jul 2023
Safety-Aware Task Composition for Discrete and Continuous Reinforcement
  Learning
Safety-Aware Task Composition for Discrete and Continuous Reinforcement Learning
Kevin J. Leahy
Makai Mann
Zachary Serlin
CoGeOffRL
90
0
0
29 Jun 2023
Beyond AUROC & co. for evaluating out-of-distribution detection
  performance
Beyond AUROC & co. for evaluating out-of-distribution detection performance
Galadrielle Humblot-Renaux
Sergio Escalera
T. Moeslund
OODD
149
9
0
26 Jun 2023
A Cosine Similarity-based Method for Out-of-Distribution Detection
A Cosine Similarity-based Method for Out-of-Distribution Detection
Nguyen Ngoc-Hieu
Nguyen Hung-Quang
The-Anh Ta
Thanh Nguyen-Tang
Khoa D. Doan
Hoang Thanh-Tung
OODD
66
3
0
23 Jun 2023
Beyond Deep Ensembles: A Large-Scale Evaluation of Bayesian Deep
  Learning under Distribution Shift
Beyond Deep Ensembles: A Large-Scale Evaluation of Bayesian Deep Learning under Distribution ShiftNeural Information Processing Systems (NeurIPS), 2023
Florian Seligmann
P. Becker
Michael Volpp
Gerhard Neumann
UQCV
282
22
0
21 Jun 2023
Evaluating Superhuman Models with Consistency Checks
Evaluating Superhuman Models with Consistency Checks
Lukas Fluri
Daniel Paleka
Florian Tramèr
ELM
256
49
0
16 Jun 2023
Datasets and Benchmarks for Offline Safe Reinforcement Learning
Datasets and Benchmarks for Offline Safe Reinforcement Learning
Zuxin Liu
Zijian Guo
Haohong Lin
Yi-Fan Yao
Jiacheng Zhu
...
Hanjiang Hu
Wenhao Yu
Tingnan Zhang
Jie Tan
Ding Zhao
OffRL
242
51
0
15 Jun 2023
TASRA: a Taxonomy and Analysis of Societal-Scale Risks from AI
TASRA: a Taxonomy and Analysis of Societal-Scale Risks from AI
Andrew Critch
Stuart J. Russell
127
33
0
12 Jun 2023
Mitigating Transformer Overconfidence via Lipschitz Regularization
Mitigating Transformer Overconfidence via Lipschitz RegularizationConference on Uncertainty in Artificial Intelligence (UAI), 2023
Wenqian Ye
Yunsheng Ma
Xu Cao
Kun Tang
136
17
0
12 Jun 2023
Multi-Agent Reinforcement Learning Guided by Signal Temporal Logic
  Specifications
Multi-Agent Reinforcement Learning Guided by Signal Temporal Logic Specifications
Jiangwei Wang
Shuo Yang
Ziyan An
Songyang Han
Zhili Zhang
Rahul Mangharam
Meiyi Ma
Fei Miao
206
11
0
11 Jun 2023
Previous
123...101112...262728
Next