ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1606.06565
  4. Cited By
Concrete Problems in AI Safety
v1v2 (latest)

Concrete Problems in AI Safety

21 June 2016
Dario Amodei
C. Olah
Jacob Steinhardt
Paul Christiano
John Schulman
Dandelion Mané
ArXiv (abs)PDFHTML

Papers citing "Concrete Problems in AI Safety"

50 / 1,379 papers shown
Inductive Generalization in Reinforcement Learning from Specifications
Inductive Generalization in Reinforcement Learning from Specifications
Vignesh Subramanian
Rohit Kushwah
Subhajit Roy
Suguman Bansal
OffRL
326
1
0
05 Jun 2024
Scaling Laws for Reward Model Overoptimization in Direct Alignment
  Algorithms
Scaling Laws for Reward Model Overoptimization in Direct Alignment Algorithms
Rafael Rafailov
Yaswanth Chittepu
Ryan Park
Harshit S. Sikchi
Joey Hejna
Bradley Knox
Chelsea Finn
S. Niekum
362
98
0
05 Jun 2024
Feature contamination: Neural networks learn uncorrelated features and fail to generalize
Feature contamination: Neural networks learn uncorrelated features and fail to generalize
Tianren Zhang
Chujie Zhao
Guanyu Chen
Yizhou Jiang
Feng Chen
OODMLTOODD
434
9
0
05 Jun 2024
Relaxed Quantile Regression: Prediction Intervals for Asymmetric Noise
Relaxed Quantile Regression: Prediction Intervals for Asymmetric Noise
T. Pouplin
Alan Jeffares
Nabeel Seedat
Mihaela van der Schaar
751
7
0
05 Jun 2024
A Generalized Apprenticeship Learning Framework for Modeling
  Heterogeneous Student Pedagogical Strategies
A Generalized Apprenticeship Learning Framework for Modeling Heterogeneous Student Pedagogical Strategies
Md Mirajul Islam
Xi Yang
J. Hostetter
Adittya Soukarjya Saha
Min Chi
235
2
0
04 Jun 2024
Seed-TTS: A Family of High-Quality Versatile Speech Generation Models
Seed-TTS: A Family of High-Quality Versatile Speech Generation Models
Philip Anastassiou
Jiawei Chen
Jingshu Chen
Yuanzhe Chen
Zhuo Chen
...
Wenjie Zhang
Yanzhe Zhang
Zilin Zhao
Dejian Zhong
Xiaobin Zhuang
310
250
0
04 Jun 2024
SaVeR: Optimal Data Collection Strategy for Safe Policy Evaluation in
  Tabular MDP
SaVeR: Optimal Data Collection Strategy for Safe Policy Evaluation in Tabular MDP
Subhojyoti Mukherjee
Josiah P. Hanna
Robert Nowak
OffRL
233
0
0
04 Jun 2024
Zero-Shot Out-of-Distribution Detection with Outlier Label Exposure
Zero-Shot Out-of-Distribution Detection with Outlier Label Exposure
Choubo Ding
Guansong Pang
OODDVLM
244
6
0
03 Jun 2024
Policy Verification in Stochastic Dynamical Systems Using Logarithmic Neural Certificates
Policy Verification in Stochastic Dynamical Systems Using Logarithmic Neural Certificates
Thom S. Badings
Wietze Koops
Sebastian Junges
Nils Jansen
416
0
0
02 Jun 2024
Investigating Calibration and Corruption Robustness of Post-hoc Pruned
  Perception CNNs: An Image Classification Benchmark Study
Investigating Calibration and Corruption Robustness of Post-hoc Pruned Perception CNNs: An Image Classification Benchmark Study
Pallavi Mitra
Gesina Schwalbe
Nadja Klein
AAML
210
4
0
31 May 2024
AI Safety: A Climb To Armageddon?
AI Safety: A Climb To Armageddon?
H. Cappelen
J. Dever
John Hawthorne
101
2
0
30 May 2024
AI Risk Management Should Incorporate Both Safety and Security
AI Risk Management Should Incorporate Both Safety and Security
Xiangyu Qi
Yangsibo Huang
Yi Zeng
Edoardo Debenedetti
Jonas Geiping
...
Chaowei Xiao
Yue Liu
Dawn Song
Peter Henderson
Prateek Mittal
AAML
271
20
0
29 May 2024
Offline Regularised Reinforcement Learning for Large Language Models
  Alignment
Offline Regularised Reinforcement Learning for Large Language Models Alignment
Pierre Harvey Richemond
Yunhao Tang
Daniel Guo
Daniele Calandriello
M. G. Azar
...
Gil Shamir
Rishabh Joshi
Tianqi Liu
Rémi Munos
Bilal Piot
OffRL
238
41
0
29 May 2024
Efficient Model-agnostic Alignment via Bayesian Persuasion
Efficient Model-agnostic Alignment via Bayesian Persuasion
Fengshuo Bai
Mingzhi Wang
Zhaowei Zhang
Boyuan Chen
Yinda Xu
Ying Wen
Yaodong Yang
282
9
0
29 May 2024
Safe Reinforcement Learning in Black-Box Environments via Adaptive Shielding
Safe Reinforcement Learning in Black-Box Environments via Adaptive Shielding
Daniel Bethell
Simos Gerasimou
R. Calinescu
Calum Imrie
OffRLOnRL
469
1
0
28 May 2024
Exploring and steering the moral compass of Large Language Models
Exploring and steering the moral compass of Large Language Models
Alejandro Tlaie
LLMSV
223
6
0
27 May 2024
WeiPer: OOD Detection using Weight Perturbations of Class Projections
WeiPer: OOD Detection using Weight Perturbations of Class Projections
Maximilian Granz
Manuel Heurich
Tim Landgraf
OODD
317
3
0
27 May 2024
Crafting Interpretable Embeddings by Asking LLMs Questions
Crafting Interpretable Embeddings by Asking LLMs Questions
Vinamra Benara
Chandan Singh
John X. Morris
Richard Antonello
Ion Stoica
Alexander G. Huth
Jianfeng Gao
239
11
0
26 May 2024
Pragmatic Feature Preferences: Learning Reward-Relevant Preferences from
  Human Input
Pragmatic Feature Preferences: Learning Reward-Relevant Preferences from Human InputInternational Conference on Machine Learning (ICML), 2024
Andi Peng
Yuying Sun
Tianmin Shu
David Abel
223
4
0
23 May 2024
Similarity-Navigated Conformal Prediction for Graph Neural Networks
Similarity-Navigated Conformal Prediction for Graph Neural NetworksNeural Information Processing Systems (NeurIPS), 2024
Jianqing Song
Jianguo Huang
Wenyu Jiang
Baoming Zhang
Shuangjie Li
Chongjun Wang
317
7
0
23 May 2024
Online Self-Preferring Language Models
Online Self-Preferring Language Models
Yuanzhao Zhai
Zhuo Zhang
Kele Xu
Hanyang Peng
Yue Yu
Dawei Feng
Cheng Yang
Bo Ding
Huaimin Wang
179
0
0
23 May 2024
Human-AI Safety: A Descendant of Generative AI and Control Systems
  Safety
Human-AI Safety: A Descendant of Generative AI and Control Systems Safety
Andrea V. Bajcsy
J. F. Fisac
248
8
0
16 May 2024
Understanding the performance gap between online and offline alignment
  algorithms
Understanding the performance gap between online and offline alignment algorithms
Yunhao Tang
Daniel Guo
Zeyu Zheng
Daniele Calandriello
Yuan Cao
...
Rémi Munos
Bernardo Avila-Pires
Michal Valko
Yong Cheng
Will Dabney
OffRLOnRL
294
93
0
14 May 2024
Towards Guaranteed Safe AI: A Framework for Ensuring Robust and Reliable
  AI Systems
Towards Guaranteed Safe AI: A Framework for Ensuring Robust and Reliable AI Systems
David Dalrymple
Joar Skalse
Yoshua Bengio
Stuart J. Russell
Max Tegmark
...
Clark Barrett
Ding Zhao
Zhi-Xuan Tan
Jeannette Wing
Joshua Tenenbaum
350
92
0
10 May 2024
One vs. Many: Comprehending Accurate Information from Multiple Erroneous
  and Inconsistent AI Generations
One vs. Many: Comprehending Accurate Information from Multiple Erroneous and Inconsistent AI GenerationsConference on Fairness, Accountability and Transparency (FAccT), 2024
Yoonjoo Lee
Kihoon Son
Tae Soo Kim
Jisu Kim
John Joon Young Chung
Eytan Adar
Juho Kim
230
28
0
09 May 2024
Interpretable Cross-Examination Technique (ICE-T): Using highly
  informative features to boost LLM performance
Interpretable Cross-Examination Technique (ICE-T): Using highly informative features to boost LLM performance
Goran Muric
Ben Delay
Steven Minton
140
1
0
08 May 2024
Hybrid Convolutional Neural Networks with Reliability Guarantee
Hybrid Convolutional Neural Networks with Reliability Guarantee
Hans Dermot Doran
Suzana Veljanovska
333
2
0
08 May 2024
Compressed Latent Replays for Lightweight Continual Learning on Spiking
  Neural Networks
Compressed Latent Replays for Lightweight Continual Learning on Spiking Neural Networks
Alberto Dequino
Alessio Carpegna
D. Nadalini
Alessandro Savino
Luca Benini
S. Di Carlo
Francesco Conti
264
3
0
08 May 2024
The Elephant in the Room -- Why AI Safety Demands Diverse Teams
The Elephant in the Room -- Why AI Safety Demands Diverse Teams
David Rostcheck
Lara Scheibling
167
1
0
07 May 2024
Reverse Forward Curriculum Learning for Extreme Sample and Demonstration
  Efficiency in Reinforcement Learning
Reverse Forward Curriculum Learning for Extreme Sample and Demonstration Efficiency in Reinforcement Learning
Stone Tao
Arth Shukla
Tse-kai Chan
Hao Su
OffRL
222
9
0
06 May 2024
Semantic Objective Functions: A distribution-aware method for adding
  logical constraints in deep learning
Semantic Objective Functions: A distribution-aware method for adding logical constraints in deep learningInternational Conference on Agents and Artificial Intelligence (ICAART), 2024
Miguel Ángel Méndez Lucero
Enrique Bojorquez Gallardo
Vaishak Belle
183
2
0
03 May 2024
Generative AI in Cybersecurity
Generative AI in Cybersecurity
Shivani Metta
Isaac Chang
Jack Parker
Michael P. Roman
Arturo F. Ehuan
148
10
0
02 May 2024
Explainable AI (XAI) in Image Segmentation in Medicine, Industry, and
  Beyond: A Survey
Explainable AI (XAI) in Image Segmentation in Medicine, Industry, and Beyond: A SurveyICT express (IE), 2024
Rokas Gipiškis
Chun-Wei Tsai
Olga Kurasova
333
30
0
02 May 2024
Efficient Exploration of Image Classifier Failures with Bayesian
  Optimization and Text-to-Image Models
Efficient Exploration of Image Classifier Failures with Bayesian Optimization and Text-to-Image Models
Adrien Le Coz
Houssem Ouertatani
Stéphane Herbin
Faouzi Adjed
179
0
0
26 Apr 2024
Taming False Positives in Out-of-Distribution Detection with Human
  Feedback
Taming False Positives in Out-of-Distribution Detection with Human Feedback
Harit Vishwakarma
Heguang Lin
Ramya Korlakai Vinayak
OODD
266
9
0
25 Apr 2024
Cooperate or Collapse: Emergence of Sustainable Cooperation in a Society
  of LLM Agents
Cooperate or Collapse: Emergence of Sustainable Cooperation in a Society of LLM Agents
Giorgio Piatti
Zhijing Jin
Max Kleiman-Weiner
Bernhard Schölkopf
Mrinmaya Sachan
Amélie Reymond
LLMAG
396
53
0
25 Apr 2024
FedSI: Federated Subnetwork Inference for Efficient Uncertainty
  Quantification
FedSI: Federated Subnetwork Inference for Efficient Uncertainty Quantification
Hui Chen
Hengyu Liu
Zhangkai Wu
Xuhui Fan
LongBing Cao
FedML
209
2
0
24 Apr 2024
Stepwise Alignment for Constrained Language Model Policy Optimization
Stepwise Alignment for Constrained Language Model Policy Optimization
Akifumi Wachi
Thien Q. Tran
Rei Sato
Takumi Tanabe
Yohei Akimoto
255
17
0
17 Apr 2024
Toward a Realistic Benchmark for Out-of-Distribution Detection
Toward a Realistic Benchmark for Out-of-Distribution Detection
Pietro Recalcati
Fabio Garcea
Luca Piano
Fabrizio Lamberti
Lia Morra
OODD
286
1
0
16 Apr 2024
Best Practices and Lessons Learned on Synthetic Data for Language Models
Best Practices and Lessons Learned on Synthetic Data for Language Models
Ruibo Liu
Jerry W. Wei
Fangyu Liu
Chenglei Si
Yanzhe Zhang
...
Steven Zheng
Daiyi Peng
Diyi Yang
Denny Zhou
Andrew M. Dai
SyDaEgoV
304
112
0
11 Apr 2024
Reducing Human-Robot Goal State Divergence with Environment Design
Reducing Human-Robot Goal State Divergence with Environment Design
Kelsey Sikes
Sarah Keren
S. Sreedharan
210
2
0
10 Apr 2024
Automatic Authorities: Power and AI
Automatic Authorities: Power and AI
Seth Lazar
122
3
0
09 Apr 2024
Deep Learning-Based Out-of-distribution Source Code Data Identification:
  How Far Have We Gone?
Deep Learning-Based Out-of-distribution Source Code Data Identification: How Far Have We Gone?
Van Nguyen
Xingliang Yuan
Tingmin Wu
Surya Nepal
M. Grobler
Carsten Rudolph
229
2
0
09 Apr 2024
Direct Nash Optimization: Teaching Language Models to Self-Improve with
  General Preferences
Direct Nash Optimization: Teaching Language Models to Self-Improve with General Preferences
Corby Rosset
Ching-An Cheng
Arindam Mitra
Michael Santacroce
Ahmed Hassan Awadallah
Tengyang Xie
472
155
0
04 Apr 2024
Laser Learning Environment: A new environment for coordination-critical
  multi-agent tasks
Laser Learning Environment: A new environment for coordination-critical multi-agent tasks
Yannick Molinghen
Raphael Avalos
Mark Van Achter
A. Nowé
Tom Lenaerts
213
1
0
04 Apr 2024
Regularized Best-of-N Sampling with Minimum Bayes Risk Objective for Language Model Alignment
Regularized Best-of-N Sampling with Minimum Bayes Risk Objective for Language Model Alignment
Yuu Jinnai
Tetsuro Morimura
Kaito Ariu
Kenshi Abe
443
3
0
01 Apr 2024
Coverage-Guaranteed Prediction Sets for Out-of-Distribution Data
Coverage-Guaranteed Prediction Sets for Out-of-Distribution Data
Xin Zou
Weiwei Liu
201
3
0
29 Mar 2024
Open-Set Recognition in the Age of Vision-Language Models
Open-Set Recognition in the Age of Vision-Language Models
Dimity Miller
Niko Sünderhauf
Alex Kenna
Keita Mason
VLM
250
10
0
25 Mar 2024
Scaling Learning based Policy Optimization for Temporal Tasks via
  Dropout
Scaling Learning based Policy Optimization for Temporal Tasks via Dropout
Navid Hashemi
Bardh Hoxha
Danil Prokhorov
Georgios Fainekos
Jyotirmoy Deshmukh
178
2
0
23 Mar 2024
On the Detection of Anomalous or Out-Of-Distribution Data in Vision
  Models Using Statistical Techniques
On the Detection of Anomalous or Out-Of-Distribution Data in Vision Models Using Statistical Techniques
Laura O'Mahony
David JP O'Sullivan
Nikola S. Nikolov
188
1
0
21 Mar 2024
Previous
123...789...262728
Next