Human-AI Safety: A Descendant of Generative AI and Control Systems Safety

16 May 2024

Papers citing "Human-AI Safety: A Descendant of Generative AI and Control Systems Safety"

12 / 12 papers shown

Title
Bridging Today and the Future of Humanity: AI Safety in 2024 and Beyond Shanshan Han 55 1 0 09 Oct 2024
"Task Success" is not Enough: Investigating the Use of Video-Language Models as Behavior Critics for Catching Undesirable Agent Behaviors L. Guan Yifan Zhou Denis Liu Yantian Zha H. B. Amor Subbarao Kambhampati LM&Ro 18 16 0 06 Feb 2024
Generative Agents: Interactive Simulacra of Human Behavior J. Park Joseph C. O'Brien Carrie J. Cai Meredith Ringel Morris Percy Liang Michael S. Bernstein LM&Ro AI4CE 206 1,701 0 07 Apr 2023
Online Update of Safety Assurances Using Confidence-Based Predictions Kensuke Nakamura Somil Bansal 30 19 0 03 Oct 2022
Red Teaming Language Models to Reduce Harms: Methods, Scaling Behaviors, and Lessons Learned Deep Ganguli Liane Lovitt John Kernion Amanda Askell Yuntao Bai ... Nicholas Joseph Sam McCandlish C. Olah Jared Kaplan Jack Clark 216 327 0 23 Aug 2022
Training language models to follow instructions with human feedback Long Ouyang Jeff Wu Xu Jiang Diogo Almeida Carroll L. Wainwright ... Amanda Askell Peter Welinder Paul Christiano Jan Leike Ryan J. Lowe OSLM ALM 301 11,730 0 04 Mar 2022
Chain-of-Thought Prompting Elicits Reasoning in Large Language Models Jason W. Wei Xuezhi Wang Dale Schuurmans Maarten Bosma Brian Ichter F. Xia Ed H. Chi Quoc Le Denny Zhou LM&Ro LRM AI4CE ReLM 315 8,261 0 28 Jan 2022
Safety Assurances for Human-Robot Interaction via Confidence-aware Game-theoretic Human Models Ran Tian Liting Sun Andrea V. Bajcsy M. Tomizuka Anca Dragan 32 55 0 29 Sep 2021
Unsolved Problems in ML Safety Dan Hendrycks Nicholas Carlini John Schulman Jacob Steinhardt 156 268 0 28 Sep 2021
Safe Nonlinear Control Using Robust Neural Lyapunov-Barrier Functions Charles Dawson Zengyi Qin Sicun Gao Chuchu Fan 102 168 0 14 Sep 2021
TrafficSim: Learning to Simulate Realistic Multi-Agent Behaviors Simon Suo S. Regalado Sergio Casas R. Urtasun 134 221 0 17 Jan 2021
Fine-Tuning Language Models from Human Preferences Daniel M. Ziegler Nisan Stiennon Jeff Wu Tom B. Brown Alec Radford Dario Amodei Paul Christiano G. Irving ALM 273 1,561 0 18 Sep 2019