v1v2 (latest)

Concrete Problems in AI Safety

21 June 2016

Papers citing "Concrete Problems in AI Safety"

50 / 1,371 papers shown

Title
Omega-Regular Decision ProcessesAAAI Conference on Artificial Intelligence (AAAI), 2023 E. M. Hahn Mateo Perez S. Schewe Fabio Somenzi Ashutosh Trivedi D. Wojtczak 123 1 0 14 Dec 2023
LLF-Bench: Benchmark for Interactive Learning from Language Feedback Ching-An Cheng Andrey Kolobov Dipendra Kumar Misra Allen Nie Adith Swaminathan 210 24 0 11 Dec 2023
Modeling Risk in Reinforcement Learning: A Literature Mapping Leonardo Villalobos-Arias Derek Martin Abhijeet Krishnan Madeleine Gagné Colin M. Potts Arnav Jhala 221 0 0 08 Dec 2023
Deep Learning for Koopman-based Dynamic Movement Primitives Tyler Han Carl Glen Henshaw 124 0 0 06 Dec 2023
Compositional Policy Learning in Stochastic Control Systems with Formal GuaranteesNeural Information Processing Systems (NeurIPS), 2023 Dorde Zikelic Mathias Lechner Abhinav Verma K. Chatterjee T. Henzinger 200 17 0 03 Dec 2023
A Multifidelity Sim-to-Real Pipeline for Verifiable and Compositional Reinforcement LearningIEEE International Conference on Robotics and Automation (ICRA), 2023 Cyrus Neary Christian Ellis Aryaman Singh Samyal Craig T. Lennon Ufuk Topcu OffRL 817 1 0 02 Dec 2023
Nash Learning from Human FeedbackInternational Conference on Machine Learning (ICML), 2023 Rémi Munos Michal Valko Daniele Calandriello M. G. Azar Mark Rowland ... Nikola Momchev Olivier Bachem D. Mankowitz Doina Precup Bilal Piot 393 181 0 01 Dec 2023
Foundational Moral Values for AI Alignment Betty Hou Brian Patrick Green 140 1 0 28 Nov 2023
(Ir)rationality in AI: State of the Art, Research Challenges and Open QuestionsArtificial Intelligence Review (AIR), 2023 Olivia Macmillan-Scott Mirco Musolesi 335 3 0 28 Nov 2023
Survey on AI Ethics: A Socio-technical PerspectiveInternational Conference on Climate Informatics (ICCI), 2023 Dave Mbiazi Meghana Bhange Maryam Babaei Ivaxi Sheth Patrik Kenfack Samira Ebrahimi Kahou 291 8 0 28 Nov 2023
Exploring the Robustness of Model-Graded Evaluations and Automated Interpretability Simon Lermen Ondvrej Kvapil ELM AAML 96 3 0 26 Nov 2023
Efficient Open-world Reinforcement Learning via Knowledge Distillation and Autonomous Rule Discovery Ekaterina Nikonova Cheng Xue Jochen Renz CLL 142 1 0 24 Nov 2023
GPQA: A Graduate-Level Google-Proof Q&A Benchmark David Rein Betty Li Hou Asa Cooper Stickland Jackson Petty Richard Yuanzhe Pang Julien Dirani Julian Michael Samuel R. Bowman AI4MH ELM 349 1,531 0 20 Nov 2023
Towards Few-shot Out-of-Distribution Detection Jiuqing Dong Yongbin Gao Heng Zhou Jun Cen Yifan Yao Sook Yoon Park Dong Sun OODD 166 3 0 20 Nov 2023
Refining Perception Contracts: Case Studies in Vision-based Safe Auto-landing Yangge Li Benjamin C Yang Yixuan Jia Daniel Zhuang Sayan Mitra 242 5 0 15 Nov 2023
Cooperative AI via Decentralized Commitment Devices Xinyuan Sun Davide Crapis Matt Stephenson B. Monnot Thomas Thiery Jonathan Passerat-Palmbach 197 13 0 14 Nov 2023
EviPrompt: A Training-Free Evidential Prompt Generation Method for Segment Anything Model in Medical Images Yinsong Xu Jiaqi Tang Aidong Men Qingchao Chen VLM MedIm 215 9 0 10 Nov 2023
Why Do Probabilistic Clinical Models Fail To Transport Between Sites? Thomas A. Lasko Eric V. Strobl William W Stead OOD 142 17 0 08 Nov 2023
Towards Interpretable Sequence Continuation: Analyzing Shared Circuits in Large Language Models Michael Lan Phillip H. S. Torr Fazl Barez LRM 295 8 0 07 Nov 2023
SeRO: Self-Supervised Reinforcement Learning for Recovery from Out-of-Distribution SituationsInternational Joint Conference on Artificial Intelligence (IJCAI), 2023 Chan Kim JaeKyung Cho C. Bobda Seung-Woo Seo Seong-Woo Kim 177 4 0 07 Nov 2023
CLIP-Motion: Learning Reward Functions for Robotic Actions Using Consecutive Observations Xuzhe Dang Stefan Edelkamp 394 7 0 06 Nov 2023
Reinforcement Learning for Safety Testing: Lessons from A Mobile Robot Case Study Tom P. Huck Martin Kaiser Constantin Cronrath Bengt Lennartson Torsten Kröger Tamim Asfour 105 1 0 06 Nov 2023
Online Non-convex Optimization with Long-term Non-convex Constraints Shijie Pan Jianyu Xu Wenjie Huang 234 0 0 04 Nov 2023
LoRA Fine-tuning Efficiently Undoes Safety Training in Llama 2-Chat 70B Simon Lermen Charlie Rogers-Smith Jeffrey Ladish ALM 200 136 0 31 Oct 2023
A Review of the Evidence for Existential Risk from AI via Misaligned Power-Seeking Rose Hadshar 122 9 0 27 Oct 2023
Social Contract AI: Aligning AI Assistants with Implicit Group Norms Jan-Philipp Fränken Sam Kwok Peixuan Ye Kanishk Gandhi Dilip Arumugam Jared Moore Alex Tamkin Tobias Gerstenberg Noah D. Goodman 230 9 0 26 Oct 2023
Multi-scale Diffusion Denoised SmoothingNeural Information Processing Systems (NeurIPS), 2023 Jongheon Jeong Jinwoo Shin DiffM 249 13 0 25 Oct 2023
Can You Rely on Your Model Evaluation? Improving Model Evaluation with Synthetic Test DataNeural Information Processing Systems (NeurIPS), 2023 B. V. Breugel Nabeel Seedat F. Imrie M. Schaar SyDa 159 35 0 25 Oct 2023
DePAint: A Decentralized Safe Multi-Agent Reinforcement Learning Algorithm considering Peak and Average Constraints Raheeb Hassan K. M. S. Wadith Md. Mamun-or Rashid Md. Mosaddek Khan 187 3 0 22 Oct 2023
LUNA: A Model-Based Universal Analysis Framework for Large Language ModelsIEEE Transactions on Software Engineering (TSE), 2023 Da Song Xuan Xie Yuheng Huang Derui Zhu Yuheng Huang Felix Juefei Xu Lei Ma ALM 266 9 0 22 Oct 2023
A PAC Learning Algorithm for LTL and Omega-regular Objectives in MDPsAAAI Conference on Artificial Intelligence (AAAI), 2023 Mateo Perez Fabio Somenzi Ashutosh Trivedi 333 10 0 18 Oct 2023
Understanding Reward Ambiguity Through Optimal Transport Theory in Inverse Reinforcement Learning Ali Baheri 37 5 0 18 Oct 2023
Compositional preference models for aligning LMsInternational Conference on Learning Representations (ICLR), 2023 Dongyoung Go Tomasz Korbak Germán Kruszewski Jos Rozen Marc Dymetman 247 25 0 17 Oct 2023
Factored Verification: Detecting and Reducing Hallucination in Summaries of Academic Papers Charlie George Andreas Stuhlmuller HILM 81 9 0 16 Oct 2023
IW-GAE: Importance Weighted Group Accuracy Estimation for Improved Calibration and Model Selection in Unsupervised Domain AdaptationInternational Conference on Machine Learning (ICML), 2023 Taejong Joo Diego Klabjan 327 1 0 16 Oct 2023
Exploring Large Language Models for Multi-Modal Out-of-Distribution DetectionConference on Empirical Methods in Natural Language Processing (EMNLP), 2023 Yi Dai Hao Lang Kaisheng Zeng Fei Huang Yongbin Li OODD 199 16 0 12 Oct 2023
RoboCLIP: One Demonstration is Enough to Learn Robot PoliciesNeural Information Processing Systems (NeurIPS), 2023 Sumedh Anand Sontakke Jesse Zhang Sébastien M. R. Arnold Karl Pertsch Erdem Biyik Dorsa Sadigh Chelsea Finn Laurent Itti OffRL 186 109 0 11 Oct 2023
Imitation Learning from Purified DemonstrationInternational Conference on Machine Learning (ICML), 2023 Yunke Wang Minjing Dong Bo Du Chang Xu 157 1 0 11 Oct 2023
The Geometry of Truth: Emergent Linear Structure in Large Language Model Representations of True/False Datasets Samuel Marks Max Tegmark HILM 386 332 0 10 Oct 2023
SALMON: Self-Alignment with Instructable Reward ModelsInternational Conference on Learning Representations (ICLR), 2023 Zhiqing Sun Songlin Yang Hongxin Zhang Qinhong Zhou Zhenfang Chen David D. Cox Yiming Yang Chuang Gan ALM SyDa 293 53 0 09 Oct 2023
Dynamic value alignment through preference aggregation of multiple objectives Marcin Korecki Damian Dailisan Cesare Carissimo 208 1 0 09 Oct 2023
Replication of Multi-agent Reinforcement Learning for the "Hide and Seek" Problem Haider Kamal M. Niazi Hammad Afzal 189 0 0 09 Oct 2023
Balancing Autonomy and Alignment: A Multi-Dimensional Taxonomy for Autonomous LLM-powered Multi-Agent Architectures Thorsten Händler LLMAG 138 34 0 05 Oct 2023
Safe Exploration in Reinforcement Learning: A Generalized Formulation and AlgorithmsNeural Information Processing Systems (NeurIPS), 2023 Akifumi Wachi Wataru Hashimoto Xun Shen Kazumune Hashimoto 234 16 0 05 Oct 2023
Assessing Large Language Models on Climate InformationInternational Conference on Machine Learning (ICML), 2023 Jannis Bulian Mike S. Schäfer Afra Amini Heidi Lam Massimiliano Ciaramita ... Michelle Chen Huebscher Christian Buck Niels G. Mede Markus Leippold Nadine Strauss ELM 198 31 0 04 Oct 2023
Searching for High-Value Molecules Using Reinforcement Learning and TransformersInternational Conference on Learning Representations (ICLR), 2023 Raj Ghugare Santiago Miret Adriana Hugessen Mariano Phielipp Glen Berseth 193 18 0 04 Oct 2023
Functional trustworthiness of AI systems by statistically valid testing Bernhard Nessler Thomas Doms Sepp Hochreiter 108 0 0 04 Oct 2023
Self-Taught Optimizer (STOP): Recursively Self-Improving Code Generation E. Zelikman Eliana Lorch Lester W. Mackey Adam Tauman Kalai LRM ReLM 236 74 0 03 Oct 2023
DeepDecipher: Accessing and Investigating Neuron Activation in Large Language Models Albert Garde Esben Kran Fazl Barez 278 2 0 03 Oct 2023
LoFT: Local Proxy Fine-tuning For Improving Transferability Of Adversarial Attacks Against Large Language Model Muhammad Ahmed Shah Roshan S. Sharma Hira Dhamyal R. Olivier Ankit Shah ... Massa Baali Soham Deshmukh Michael Kuhlmann Bhiksha Raj Rita Singh AAML 119 23 0 02 Oct 2023