Compromesso! Italian Many-Shot Jailbreaks Undermine the Safety of Large Language Models

8 August 2024

Papers citing "Compromesso! Italian Many-Shot Jailbreaks Undermine the Safety of Large Language Models"

2 / 2 papers shown

Title
COLD-Attack: Jailbreaking LLMs with Stealthiness and Controllability Xing-ming Guo Fangxu Yu Huan Zhang Lianhui Qin Bin Hu AAML 109 69 0 13 Feb 2024
Training language models to follow instructions with human feedback Long Ouyang Jeff Wu Xu Jiang Diogo Almeida Carroll L. Wainwright ... Amanda Askell Peter Welinder Paul Christiano Jan Leike Ryan J. Lowe OSLM ALM 301 11,730 0 04 Mar 2022