Iterative Self-Tuning LLMs for Enhanced Jailbreaking Capabilities

24 October 2024

Jianfeng Gao

Papers citing "Iterative Self-Tuning LLMs for Enhanced Jailbreaking Capabilities"

1 / 1 papers shown

Title
ThinkEdit: Interpretable Weight Editing to Mitigate Overly Short Thinking in Reasoning Models Chung-En Sun Ge Yan Tsui-Wei Weng KELM LRM 55 0 0 27 Mar 2025