Lockpicking LLMs: A Logit-Based Jailbreak Using Token-level Manipulation

v1v2v3 (latest)

Lockpicking LLMs: A Logit-Based Jailbreak Using Token-level Manipulation

20 May 2024

Yi Liu

Kailong Wang

ArXiv (abs)PDF HTML Github (19★)

Papers citing "Lockpicking LLMs: A Logit-Based Jailbreak Using Token-level Manipulation"

9 / 9 papers shown

Title
LLM in the Middle: A Systematic Review of Threats and Mitigations to Real-World LLM-based Systems Vitor Hugo Galhardo Moia Igor Jochem Sanz Gabriel Antonio Fontes Rebello Rodrigo Duarte de Meneses Briland Hitaj Ulf Lindqvist 205 0 0 12 Sep 2025
Embedding Poisoning: Bypassing Safety Alignment via Embedding Semantic Shift Shuai Yuan Zhibo Zhang Yuxi Li Guangdong Bai Wang Kailong 84 0 0 08 Sep 2025
Where to Start Alignment? Diffusion Large Language Model May Demand a Distinct Position Zhixin Xie Xurui Song Jun Luo 88 5 0 17 Aug 2025
Layer-Wise Perturbations via Sparse Autoencoders for Adversarial Text Generation Huizhen Shu Xuying Li Qirui Wang Yuji Kosuga Mengqiu Tian Zhuo Li AAML SILM 137 0 0 14 Aug 2025
Keeping LLMs Aligned After Fine-tuning: The Crucial Role of Prompt TemplatesNeural Information Processing Systems (NeurIPS), 2024 Kaifeng Lyu Haoyu Zhao Xinran Gu Dingli Yu Anirudh Goyal Sanjeev Arora ALM 347 83 0 20 Jan 2025
Efficient Detection of Toxic Prompts in Large Language ModelsInternational Conference on Automated Software Engineering (ASE), 2024 Yi Liu Junzhe Yu Huijia Sun Ling Shi Gelei Deng Yuqi Chen Yang Liu 370 12 0 21 Aug 2024
Lifelong Knowledge Editing for LLMs with Retrieval-Augmented Continuous Prompt LearningConference on Empirical Methods in Natural Language Processing (EMNLP), 2024 Qizhou Chen Taolin Zhang Xiaofeng He Dongyang Li Chengyu Wang Longtao Huang Hui Xue CLL KELM 283 31 0 06 May 2024
MemLLM: Finetuning LLMs to Use An Explicit Read-Write Memory Ali Modarressi Abdullatif Köksal Ayyoob Imani Mohsen Fayyaz Hinrich Schütze KELM 497 22 0 17 Apr 2024
Jailbreaking Prompt Attack: A Controllable Adversarial Attack against Diffusion ModelsNorth American Chapter of the Association for Computational Linguistics (NAACL), 2024 Jiachen Ma Anda Cao Zhiqing Xiao Jie Zhang Chaonan Ye Chao Ye Junbo Zhao 493 57 0 02 Apr 2024