ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2405.13068
  4. Cited By
Lockpicking LLMs: A Logit-Based Jailbreak Using Token-level Manipulation
v1v2v3 (latest)

Lockpicking LLMs: A Logit-Based Jailbreak Using Token-level Manipulation

20 May 2024
Yuxi Li
Yi Liu
Yuekang Li
Ling Shi
Gelei Deng
Shengquan Chen
Kailong Wang
ArXiv (abs)PDFHTMLGithub (19★)

Papers citing "Lockpicking LLMs: A Logit-Based Jailbreak Using Token-level Manipulation"

9 / 9 papers shown
Title
LLM in the Middle: A Systematic Review of Threats and Mitigations to Real-World LLM-based Systems
LLM in the Middle: A Systematic Review of Threats and Mitigations to Real-World LLM-based Systems
Vitor Hugo Galhardo Moia
Igor Jochem Sanz
Gabriel Antonio Fontes Rebello
Rodrigo Duarte de Meneses
Briland Hitaj
Ulf Lindqvist
205
0
0
12 Sep 2025
Embedding Poisoning: Bypassing Safety Alignment via Embedding Semantic Shift
Embedding Poisoning: Bypassing Safety Alignment via Embedding Semantic Shift
Shuai Yuan
Zhibo Zhang
Yuxi Li
Guangdong Bai
Wang Kailong
84
0
0
08 Sep 2025
Where to Start Alignment? Diffusion Large Language Model May Demand a Distinct Position
Where to Start Alignment? Diffusion Large Language Model May Demand a Distinct Position
Zhixin Xie
Xurui Song
Jun Luo
88
5
0
17 Aug 2025
Layer-Wise Perturbations via Sparse Autoencoders for Adversarial Text Generation
Layer-Wise Perturbations via Sparse Autoencoders for Adversarial Text Generation
Huizhen Shu
Xuying Li
Qirui Wang
Yuji Kosuga
Mengqiu Tian
Zhuo Li
AAMLSILM
137
0
0
14 Aug 2025
Keeping LLMs Aligned After Fine-tuning: The Crucial Role of Prompt Templates
Keeping LLMs Aligned After Fine-tuning: The Crucial Role of Prompt TemplatesNeural Information Processing Systems (NeurIPS), 2024
Kaifeng Lyu
Haoyu Zhao
Xinran Gu
Dingli Yu
Anirudh Goyal
Sanjeev Arora
ALM
347
83
0
20 Jan 2025
Efficient Detection of Toxic Prompts in Large Language Models
Efficient Detection of Toxic Prompts in Large Language ModelsInternational Conference on Automated Software Engineering (ASE), 2024
Yi Liu
Junzhe Yu
Huijia Sun
Ling Shi
Gelei Deng
Yuqi Chen
Yang Liu
370
12
0
21 Aug 2024
Lifelong Knowledge Editing for LLMs with Retrieval-Augmented Continuous Prompt Learning
Lifelong Knowledge Editing for LLMs with Retrieval-Augmented Continuous Prompt LearningConference on Empirical Methods in Natural Language Processing (EMNLP), 2024
Qizhou Chen
Taolin Zhang
Xiaofeng He
Dongyang Li
Chengyu Wang
Longtao Huang
Hui Xue
CLLKELM
283
31
0
06 May 2024
MemLLM: Finetuning LLMs to Use An Explicit Read-Write Memory
MemLLM: Finetuning LLMs to Use An Explicit Read-Write Memory
Ali Modarressi
Abdullatif Köksal
Ayyoob Imani
Mohsen Fayyaz
Hinrich Schütze
KELM
497
22
0
17 Apr 2024
Jailbreaking Prompt Attack: A Controllable Adversarial Attack against Diffusion Models
Jailbreaking Prompt Attack: A Controllable Adversarial Attack against Diffusion ModelsNorth American Chapter of the Association for Computational Linguistics (NAACL), 2024
Jiachen Ma
Anda Cao
Zhiqing Xiao
Jie Zhang
Chaonan Ye
Chao Ye
Junbo Zhao
493
57
0
02 Apr 2024
1