Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2505.16559
Cited By
CTRAP: Embedding Collapse Trap to Safeguard Large Language Models from Harmful Fine-Tuning
22 May 2025
Biao Yi
Tiansheng Huang
Baolei Zhang
Tong Li
Lihai Nie
Zheli Liu
Li Shen
MU
AAML
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"CTRAP: Embedding Collapse Trap to Safeguard Large Language Models from Harmful Fine-Tuning"
9 / 9 papers shown
Title
Virus: Harmful Fine-tuning Attack for Large Language Models Bypassing Guardrail Moderation
Tiansheng Huang
Sihao Hu
Fatih Ilhan
Selim Furkan Tekin
Ling Liu
85
9
0
29 Jan 2025
Keeping LLMs Aligned After Fine-tuning: The Crucial Role of Prompt Templates
Kaifeng Lyu
Haoyu Zhao
Xinran Gu
Dingli Yu
Anirudh Goyal
Sanjeev Arora
ALM
133
59
0
20 Jan 2025
Open Problems in Machine Unlearning for AI Safety
Fazl Barez
Tingchen Fu
Ameya Prabhu
Stephen Casper
Amartya Sanyal
...
David M. Krueger
Sören Mindermann
José Hernandez-Orallo
Mor Geva
Y. Gal
MU
113
24
0
10 Jan 2025
SaLoRA: Safety-Alignment Preserved Low-Rank Adaptation
Mingjie Li
Wai Man Si
Michael Backes
Yang Zhang
Yisen Wang
118
19
0
03 Jan 2025
Enhancing AI Safety Through the Fusion of Low Rank Adapters
Satya Swaroop Gudipudi
Sreeram Vipparla
Harpreet Singh
Shashwat Goel
Ponnurangam Kumaraguru
MoMe
AAML
86
3
0
30 Dec 2024
Do Unlearning Methods Remove Information from Language Model Weights?
Aghyad Deeb
Fabien Roger
AAML
MU
113
29
0
11 Oct 2024
A Closer Look at Machine Unlearning for Large Language Models
Xiaojian Yuan
Tianyu Pang
Chao Du
Kejiang Chen
Weiming Zhang
Min Lin
MU
259
13
0
10 Oct 2024
An Adversarial Perspective on Machine Unlearning for AI Safety
Jakub Łucki
Boyi Wei
Yangsibo Huang
Peter Henderson
F. Tramèr
Javier Rando
MU
AAML
206
53
0
26 Sep 2024
Tamper-Resistant Safeguards for Open-Weight LLMs
Rishub Tamirisa
Bhrugu Bharathi
Long Phan
Andy Zhou
Alice Gatti
...
Andy Zou
Dawn Song
Bo Li
Dan Hendrycks
Mantas Mazeika
AAML
MU
133
63
0
01 Aug 2024
1