Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2408.09600
Cited By
Antidote: Post-fine-tuning Safety Alignment for Large Language Models against Harmful Fine-tuning
18 August 2024
Tiansheng Huang
Gautam Bhattacharya
Pratik Joshi
Josh Kimball
Ling Liu
AAML
MoMe
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Antidote: Post-fine-tuning Safety Alignment for Large Language Models against Harmful Fine-tuning"
3 / 3 papers shown
Title
Targeted Vaccine: Safety Alignment for Large Language Models against Harmful Fine-Tuning via Layer-wise Perturbation
Guozhi Liu
Weiwei Lin
Tiansheng Huang
Ruichao Mo
Qi Mu
Li Shen
AAML
47
9
0
13 Oct 2024
Assessing the Brittleness of Safety Alignment via Pruning and Low-Rank Modifications
Boyi Wei
Kaixuan Huang
Yangsibo Huang
Tinghao Xie
Xiangyu Qi
Mengzhou Xia
Prateek Mittal
Mengdi Wang
Peter Henderson
AAML
55
78
0
07 Feb 2024
Training language models to follow instructions with human feedback
Long Ouyang
Jeff Wu
Xu Jiang
Diogo Almeida
Carroll L. Wainwright
...
Amanda Askell
Peter Welinder
Paul Christiano
Jan Leike
Ryan J. Lowe
OSLM
ALM
301
11,730
0
04 Mar 2022
1