ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1606.06565
  4. Cited By
Concrete Problems in AI Safety
v1v2 (latest)

Concrete Problems in AI Safety

21 June 2016
Dario Amodei
C. Olah
Jacob Steinhardt
Paul Christiano
John Schulman
Dandelion Mané
ArXiv (abs)PDFHTML

Papers citing "Concrete Problems in AI Safety"

50 / 1,371 papers shown
Title
Omega-Regular Decision Processes
Omega-Regular Decision ProcessesAAAI Conference on Artificial Intelligence (AAAI), 2023
E. M. Hahn
Mateo Perez
S. Schewe
Fabio Somenzi
Ashutosh Trivedi
D. Wojtczak
123
1
0
14 Dec 2023
LLF-Bench: Benchmark for Interactive Learning from Language Feedback
LLF-Bench: Benchmark for Interactive Learning from Language Feedback
Ching-An Cheng
Andrey Kolobov
Dipendra Kumar Misra
Allen Nie
Adith Swaminathan
210
24
0
11 Dec 2023
Modeling Risk in Reinforcement Learning: A Literature Mapping
Modeling Risk in Reinforcement Learning: A Literature Mapping
Leonardo Villalobos-Arias
Derek Martin
Abhijeet Krishnan
Madeleine Gagné
Colin M. Potts
Arnav Jhala
221
0
0
08 Dec 2023
Deep Learning for Koopman-based Dynamic Movement Primitives
Deep Learning for Koopman-based Dynamic Movement Primitives
Tyler Han
Carl Glen Henshaw
124
0
0
06 Dec 2023
Compositional Policy Learning in Stochastic Control Systems with Formal
  Guarantees
Compositional Policy Learning in Stochastic Control Systems with Formal GuaranteesNeural Information Processing Systems (NeurIPS), 2023
Dorde Zikelic
Mathias Lechner
Abhinav Verma
K. Chatterjee
T. Henzinger
200
17
0
03 Dec 2023
A Multifidelity Sim-to-Real Pipeline for Verifiable and Compositional
  Reinforcement Learning
A Multifidelity Sim-to-Real Pipeline for Verifiable and Compositional Reinforcement LearningIEEE International Conference on Robotics and Automation (ICRA), 2023
Cyrus Neary
Christian Ellis
Aryaman Singh Samyal
Craig T. Lennon
Ufuk Topcu
OffRL
817
1
0
02 Dec 2023
Nash Learning from Human Feedback
Nash Learning from Human FeedbackInternational Conference on Machine Learning (ICML), 2023
Rémi Munos
Michal Valko
Daniele Calandriello
M. G. Azar
Mark Rowland
...
Nikola Momchev
Olivier Bachem
D. Mankowitz
Doina Precup
Bilal Piot
393
181
0
01 Dec 2023
Foundational Moral Values for AI Alignment
Foundational Moral Values for AI Alignment
Betty Hou
Brian Patrick Green
140
1
0
28 Nov 2023
(Ir)rationality in AI: State of the Art, Research Challenges and Open Questions
(Ir)rationality in AI: State of the Art, Research Challenges and Open QuestionsArtificial Intelligence Review (AIR), 2023
Olivia Macmillan-Scott
Mirco Musolesi
335
3
0
28 Nov 2023
Survey on AI Ethics: A Socio-technical Perspective
Survey on AI Ethics: A Socio-technical PerspectiveInternational Conference on Climate Informatics (ICCI), 2023
Dave Mbiazi
Meghana Bhange
Maryam Babaei
Ivaxi Sheth
Patrik Kenfack
Samira Ebrahimi Kahou
291
8
0
28 Nov 2023
Exploring the Robustness of Model-Graded Evaluations and Automated
  Interpretability
Exploring the Robustness of Model-Graded Evaluations and Automated Interpretability
Simon Lermen
Ondvrej Kvapil
ELMAAML
96
3
0
26 Nov 2023
Efficient Open-world Reinforcement Learning via Knowledge Distillation
  and Autonomous Rule Discovery
Efficient Open-world Reinforcement Learning via Knowledge Distillation and Autonomous Rule Discovery
Ekaterina Nikonova
Cheng Xue
Jochen Renz
CLL
142
1
0
24 Nov 2023
GPQA: A Graduate-Level Google-Proof Q&A Benchmark
GPQA: A Graduate-Level Google-Proof Q&A Benchmark
David Rein
Betty Li Hou
Asa Cooper Stickland
Jackson Petty
Richard Yuanzhe Pang
Julien Dirani
Julian Michael
Samuel R. Bowman
AI4MHELM
349
1,531
0
20 Nov 2023
Towards Few-shot Out-of-Distribution Detection
Towards Few-shot Out-of-Distribution Detection
Jiuqing Dong
Yongbin Gao
Heng Zhou
Jun Cen
Yifan Yao
Sook Yoon
Park Dong Sun
OODD
166
3
0
20 Nov 2023
Refining Perception Contracts: Case Studies in Vision-based Safe
  Auto-landing
Refining Perception Contracts: Case Studies in Vision-based Safe Auto-landing
Yangge Li
Benjamin C Yang
Yixuan Jia
Daniel Zhuang
Sayan Mitra
242
5
0
15 Nov 2023
Cooperative AI via Decentralized Commitment Devices
Cooperative AI via Decentralized Commitment Devices
Xinyuan Sun
Davide Crapis
Matt Stephenson
B. Monnot
Thomas Thiery
Jonathan Passerat-Palmbach
197
13
0
14 Nov 2023
EviPrompt: A Training-Free Evidential Prompt Generation Method for
  Segment Anything Model in Medical Images
EviPrompt: A Training-Free Evidential Prompt Generation Method for Segment Anything Model in Medical Images
Yinsong Xu
Jiaqi Tang
Aidong Men
Qingchao Chen
VLMMedIm
215
9
0
10 Nov 2023
Why Do Probabilistic Clinical Models Fail To Transport Between Sites?
Why Do Probabilistic Clinical Models Fail To Transport Between Sites?
Thomas A. Lasko
Eric V. Strobl
William W Stead
OOD
142
17
0
08 Nov 2023
Towards Interpretable Sequence Continuation: Analyzing Shared Circuits
  in Large Language Models
Towards Interpretable Sequence Continuation: Analyzing Shared Circuits in Large Language Models
Michael Lan
Phillip H. S. Torr
Fazl Barez
LRM
295
8
0
07 Nov 2023
SeRO: Self-Supervised Reinforcement Learning for Recovery from
  Out-of-Distribution Situations
SeRO: Self-Supervised Reinforcement Learning for Recovery from Out-of-Distribution SituationsInternational Joint Conference on Artificial Intelligence (IJCAI), 2023
Chan Kim
JaeKyung Cho
C. Bobda
Seung-Woo Seo
Seong-Woo Kim
177
4
0
07 Nov 2023
CLIP-Motion: Learning Reward Functions for Robotic Actions Using Consecutive Observations
CLIP-Motion: Learning Reward Functions for Robotic Actions Using Consecutive Observations
Xuzhe Dang
Stefan Edelkamp
394
7
0
06 Nov 2023
Reinforcement Learning for Safety Testing: Lessons from A Mobile Robot
  Case Study
Reinforcement Learning for Safety Testing: Lessons from A Mobile Robot Case Study
Tom P. Huck
Martin Kaiser
Constantin Cronrath
Bengt Lennartson
Torsten Kröger
Tamim Asfour
105
1
0
06 Nov 2023
Online Non-convex Optimization with Long-term Non-convex Constraints
Online Non-convex Optimization with Long-term Non-convex Constraints
Shijie Pan
Jianyu Xu
Wenjie Huang
234
0
0
04 Nov 2023
LoRA Fine-tuning Efficiently Undoes Safety Training in Llama 2-Chat 70B
LoRA Fine-tuning Efficiently Undoes Safety Training in Llama 2-Chat 70B
Simon Lermen
Charlie Rogers-Smith
Jeffrey Ladish
ALM
200
136
0
31 Oct 2023
A Review of the Evidence for Existential Risk from AI via Misaligned
  Power-Seeking
A Review of the Evidence for Existential Risk from AI via Misaligned Power-Seeking
Rose Hadshar
122
9
0
27 Oct 2023
Social Contract AI: Aligning AI Assistants with Implicit Group Norms
Social Contract AI: Aligning AI Assistants with Implicit Group Norms
Jan-Philipp Fränken
Sam Kwok
Peixuan Ye
Kanishk Gandhi
Dilip Arumugam
Jared Moore
Alex Tamkin
Tobias Gerstenberg
Noah D. Goodman
230
9
0
26 Oct 2023
Multi-scale Diffusion Denoised Smoothing
Multi-scale Diffusion Denoised SmoothingNeural Information Processing Systems (NeurIPS), 2023
Jongheon Jeong
Jinwoo Shin
DiffM
249
13
0
25 Oct 2023
Can You Rely on Your Model Evaluation? Improving Model Evaluation with
  Synthetic Test Data
Can You Rely on Your Model Evaluation? Improving Model Evaluation with Synthetic Test DataNeural Information Processing Systems (NeurIPS), 2023
B. V. Breugel
Nabeel Seedat
F. Imrie
M. Schaar
SyDa
159
35
0
25 Oct 2023
DePAint: A Decentralized Safe Multi-Agent Reinforcement Learning
  Algorithm considering Peak and Average Constraints
DePAint: A Decentralized Safe Multi-Agent Reinforcement Learning Algorithm considering Peak and Average Constraints
Raheeb Hassan
K. M. S. Wadith
Md. Mamun-or Rashid
Md. Mosaddek Khan
187
3
0
22 Oct 2023
LUNA: A Model-Based Universal Analysis Framework for Large Language
  Models
LUNA: A Model-Based Universal Analysis Framework for Large Language ModelsIEEE Transactions on Software Engineering (TSE), 2023
Da Song
Xuan Xie
Yuheng Huang
Derui Zhu
Yuheng Huang
Felix Juefei Xu
Lei Ma
ALM
266
9
0
22 Oct 2023
A PAC Learning Algorithm for LTL and Omega-regular Objectives in MDPs
A PAC Learning Algorithm for LTL and Omega-regular Objectives in MDPsAAAI Conference on Artificial Intelligence (AAAI), 2023
Mateo Perez
Fabio Somenzi
Ashutosh Trivedi
333
10
0
18 Oct 2023
Understanding Reward Ambiguity Through Optimal Transport Theory in
  Inverse Reinforcement Learning
Understanding Reward Ambiguity Through Optimal Transport Theory in Inverse Reinforcement Learning
Ali Baheri
37
5
0
18 Oct 2023
Compositional preference models for aligning LMs
Compositional preference models for aligning LMsInternational Conference on Learning Representations (ICLR), 2023
Dongyoung Go
Tomasz Korbak
Germán Kruszewski
Jos Rozen
Marc Dymetman
247
25
0
17 Oct 2023
Factored Verification: Detecting and Reducing Hallucination in Summaries
  of Academic Papers
Factored Verification: Detecting and Reducing Hallucination in Summaries of Academic Papers
Charlie George
Andreas Stuhlmuller
HILM
81
9
0
16 Oct 2023
IW-GAE: Importance Weighted Group Accuracy Estimation for Improved
  Calibration and Model Selection in Unsupervised Domain Adaptation
IW-GAE: Importance Weighted Group Accuracy Estimation for Improved Calibration and Model Selection in Unsupervised Domain AdaptationInternational Conference on Machine Learning (ICML), 2023
Taejong Joo
Diego Klabjan
327
1
0
16 Oct 2023
Exploring Large Language Models for Multi-Modal Out-of-Distribution
  Detection
Exploring Large Language Models for Multi-Modal Out-of-Distribution DetectionConference on Empirical Methods in Natural Language Processing (EMNLP), 2023
Yi Dai
Hao Lang
Kaisheng Zeng
Fei Huang
Yongbin Li
OODD
199
16
0
12 Oct 2023
RoboCLIP: One Demonstration is Enough to Learn Robot Policies
RoboCLIP: One Demonstration is Enough to Learn Robot PoliciesNeural Information Processing Systems (NeurIPS), 2023
Sumedh Anand Sontakke
Jesse Zhang
Sébastien M. R. Arnold
Karl Pertsch
Erdem Biyik
Dorsa Sadigh
Chelsea Finn
Laurent Itti
OffRL
186
109
0
11 Oct 2023
Imitation Learning from Purified Demonstration
Imitation Learning from Purified DemonstrationInternational Conference on Machine Learning (ICML), 2023
Yunke Wang
Minjing Dong
Bo Du
Chang Xu
157
1
0
11 Oct 2023
The Geometry of Truth: Emergent Linear Structure in Large Language Model
  Representations of True/False Datasets
The Geometry of Truth: Emergent Linear Structure in Large Language Model Representations of True/False Datasets
Samuel Marks
Max Tegmark
HILM
386
332
0
10 Oct 2023
SALMON: Self-Alignment with Instructable Reward Models
SALMON: Self-Alignment with Instructable Reward ModelsInternational Conference on Learning Representations (ICLR), 2023
Zhiqing Sun
Songlin Yang
Hongxin Zhang
Qinhong Zhou
Zhenfang Chen
David D. Cox
Yiming Yang
Chuang Gan
ALMSyDa
293
53
0
09 Oct 2023
Dynamic value alignment through preference aggregation of multiple
  objectives
Dynamic value alignment through preference aggregation of multiple objectives
Marcin Korecki
Damian Dailisan
Cesare Carissimo
208
1
0
09 Oct 2023
Replication of Multi-agent Reinforcement Learning for the "Hide and
  Seek" Problem
Replication of Multi-agent Reinforcement Learning for the "Hide and Seek" Problem
Haider Kamal
M. Niazi
Hammad Afzal
189
0
0
09 Oct 2023
Balancing Autonomy and Alignment: A Multi-Dimensional Taxonomy for
  Autonomous LLM-powered Multi-Agent Architectures
Balancing Autonomy and Alignment: A Multi-Dimensional Taxonomy for Autonomous LLM-powered Multi-Agent Architectures
Thorsten Händler
LLMAG
138
34
0
05 Oct 2023
Safe Exploration in Reinforcement Learning: A Generalized Formulation
  and Algorithms
Safe Exploration in Reinforcement Learning: A Generalized Formulation and AlgorithmsNeural Information Processing Systems (NeurIPS), 2023
Akifumi Wachi
Wataru Hashimoto
Xun Shen
Kazumune Hashimoto
234
16
0
05 Oct 2023
Assessing Large Language Models on Climate Information
Assessing Large Language Models on Climate InformationInternational Conference on Machine Learning (ICML), 2023
Jannis Bulian
Mike S. Schäfer
Afra Amini
Heidi Lam
Massimiliano Ciaramita
...
Michelle Chen Huebscher
Christian Buck
Niels G. Mede
Markus Leippold
Nadine Strauss
ELM
198
31
0
04 Oct 2023
Searching for High-Value Molecules Using Reinforcement Learning and
  Transformers
Searching for High-Value Molecules Using Reinforcement Learning and TransformersInternational Conference on Learning Representations (ICLR), 2023
Raj Ghugare
Santiago Miret
Adriana Hugessen
Mariano Phielipp
Glen Berseth
193
18
0
04 Oct 2023
Functional trustworthiness of AI systems by statistically valid testing
Functional trustworthiness of AI systems by statistically valid testing
Bernhard Nessler
Thomas Doms
Sepp Hochreiter
108
0
0
04 Oct 2023
Self-Taught Optimizer (STOP): Recursively Self-Improving Code Generation
Self-Taught Optimizer (STOP): Recursively Self-Improving Code Generation
E. Zelikman
Eliana Lorch
Lester W. Mackey
Adam Tauman Kalai
LRMReLM
236
74
0
03 Oct 2023
DeepDecipher: Accessing and Investigating Neuron Activation in Large
  Language Models
DeepDecipher: Accessing and Investigating Neuron Activation in Large Language Models
Albert Garde
Esben Kran
Fazl Barez
278
2
0
03 Oct 2023
LoFT: Local Proxy Fine-tuning For Improving Transferability Of
  Adversarial Attacks Against Large Language Model
LoFT: Local Proxy Fine-tuning For Improving Transferability Of Adversarial Attacks Against Large Language Model
Muhammad Ahmed Shah
Roshan S. Sharma
Hira Dhamyal
R. Olivier
Ankit Shah
...
Massa Baali
Soham Deshmukh
Michael Kuhlmann
Bhiksha Raj
Rita Singh
AAML
119
23
0
02 Oct 2023
Previous
123...91011...262728
Next