Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2412.02159
Cited By
Jailbreak Defense in a Narrow Domain: Limitations of Existing Methods and a New Transcript-Classifier Approach
3 December 2024
T. T. Wang
John Hughes
Henry Sleight
Rylan Schaeffer
Rajashree Agrawal
Fazl Barez
Mrinank Sharma
Jesse Mu
Nir Shavit
Ethan Perez
AAML
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Jailbreak Defense in a Narrow Domain: Limitations of Existing Methods and a New Transcript-Classifier Approach"
1 / 1 papers shown
Title
FLAME: Flexible LLM-Assisted Moderation Engine
Ivan Bakulin
Ilia Kopanichuk
Iaroslav Bespalov
Nikita Radchenko
V. Shaposhnikov
Dmitry V. Dylov
Ivan Oseledets
84
0
0
13 Feb 2025
1