Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2502.16366
Cited By
A generative approach to LLM harmfulness detection with special red flag tokens
22 February 2025
Sophie Xhonneux
David Dobre
Mehrnaz Mohfakhami
Leo Schwinn
Gauthier Gidel
Re-assign community
ArXiv
PDF
HTML
Papers citing
"A generative approach to LLM harmfulness detection with special red flag tokens"
1 / 1 papers shown
Title
Safety Pretraining: Toward the Next Generation of Safe AI
Pratyush Maini
Sachin Goyal
Dylan Sam
Alex Robey
Yash Savani
Yiding Jiang
Andy Zou
Zacharcy C. Lipton
J. Zico Kolter
45
0
0
23 Apr 2025
1