Resistance Against Manipulative AI: key factors and possible actions

22 April 2024

Papers citing "Resistance Against Manipulative AI: key factors and possible actions"

3 / 3 papers shown

Title
Assessing AI vs Human-Authored Spear Phishing SMS Attacks: An Empirical Study Jerson Francia Derek Hansen Ben Schooley Matthew Taylor Shydra Murray Greg Snow 26 1 0 18 Jun 2024
Red Teaming Language Models to Reduce Harms: Methods, Scaling Behaviors, and Lessons Learned Deep Ganguli Liane Lovitt John Kernion Amanda Askell Yuntao Bai ... Nicholas Joseph Sam McCandlish C. Olah Jared Kaplan Jack Clark 218 443 0 23 Aug 2022
Challenges in Detoxifying Language Models Johannes Welbl Amelia Glaese J. Uesato Sumanth Dathathri John F. J. Mellor Lisa Anne Hendricks Kirsty Anderson Pushmeet Kohli Ben Coppin Po-Sen Huang LM&MA 242 193 0 15 Sep 2021