Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2404.14230
Cited By
Resistance Against Manipulative AI: key factors and possible actions
22 April 2024
Piotr Wilczyñski
Wiktoria Mieleszczenko-Kowszewicz
P. Biecek
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Resistance Against Manipulative AI: key factors and possible actions"
3 / 3 papers shown
Title
Assessing AI vs Human-Authored Spear Phishing SMS Attacks: An Empirical Study
Jerson Francia
Derek Hansen
Ben Schooley
Matthew Taylor
Shydra Murray
Greg Snow
26
1
0
18 Jun 2024
Red Teaming Language Models to Reduce Harms: Methods, Scaling Behaviors, and Lessons Learned
Deep Ganguli
Liane Lovitt
John Kernion
Amanda Askell
Yuntao Bai
...
Nicholas Joseph
Sam McCandlish
C. Olah
Jared Kaplan
Jack Clark
218
443
0
23 Aug 2022
Challenges in Detoxifying Language Models
Johannes Welbl
Amelia Glaese
J. Uesato
Sumanth Dathathri
John F. J. Mellor
Lisa Anne Hendricks
Kirsty Anderson
Pushmeet Kohli
Ben Coppin
Po-Sen Huang
LM&MA
242
193
0
15 Sep 2021
1