Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2406.06144
Cited By
Language Models Resist Alignment
10 June 2024
Jiaming Ji
Kaile Wang
Tianyi Qiu
Boyuan Chen
Jiayi Zhou
Changye Li
Hantao Lou
Yaodong Yang
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Language Models Resist Alignment"
4 / 4 papers shown
Title
Compression Represents Intelligence Linearly
Yuzhen Huang
Jinghan Zhang
Zifei Shan
Junxian He
39
24
0
15 Apr 2024
Assessing the Brittleness of Safety Alignment via Pruning and Low-Rank Modifications
Boyi Wei
Kaixuan Huang
Yangsibo Huang
Tinghao Xie
Xiangyu Qi
Mengzhou Xia
Prateek Mittal
Mengdi Wang
Peter Henderson
AAML
55
78
0
07 Feb 2024
A Survey of Machine Unlearning
Thanh Tam Nguyen
T. T. Huynh
Phi Le Nguyen
Alan Wee-Chung Liew
Hongzhi Yin
Quoc Viet Hung Nguyen
MU
77
216
0
06 Sep 2022
Training language models to follow instructions with human feedback
Long Ouyang
Jeff Wu
Xu Jiang
Diogo Almeida
Carroll L. Wainwright
...
Amanda Askell
Peter Welinder
Paul Christiano
Jan Leike
Ryan J. Lowe
OSLM
ALM
301
11,730
0
04 Mar 2022
1