Jailbreak Defense in a Narrow Domain: Limitations of Existing Methods and a New Transcript-Classifier Approach

3 December 2024

Papers citing "Jailbreak Defense in a Narrow Domain: Limitations of Existing Methods and a New Transcript-Classifier Approach"

1 / 1 papers shown

Title
FLAME: Flexible LLM-Assisted Moderation Engine Ivan Bakulin Ilia Kopanichuk Iaroslav Bespalov Nikita Radchenko V. Shaposhnikov Dmitry V. Dylov Ivan Oseledets 84 0 0 13 Feb 2025