Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
2507.21061
Cited By
Security practices in AI development
Ai & Society (AS), 2025
17 May 2025
Petr Spelda
Vit Stritecky
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Security practices in AI development"
3 / 3 papers shown
Constitutional Classifiers: Defending against Universal Jailbreaks across Thousands of Hours of Red Teaming
Mrinank Sharma
Meg Tong
Jesse Mu
Jerry Wei
Jorrit Kruthoff
...
Ruiqi Zhong
Giulio Zhou
Jan Leike
Jared Kaplan
Ethan Perez
403
94
0
31 Jan 2025
Open Problems in Machine Unlearning for AI Safety
Fazl Barez
Tingchen Fu
Christian Schroeder de Witt
Stephen Casper
Amartya Sanyal
...
David M. Krueger
Sören Mindermann
José Hernandez-Orallo
Mor Geva
Y. Gal
MU
347
36
0
10 Jan 2025
Tamper-Resistant Safeguards for Open-Weight LLMs
International Conference on Learning Representations (ICLR), 2024
Rishub Tamirisa
Bhrugu Bharathi
Long Phan
Andy Zhou
Alice Gatti
...
Andy Zou
Dawn Song
Bo Li
Dan Hendrycks
Mantas Mazeika
AAML
MU
460
105
0
01 Aug 2024
1