Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2410.13708
Cited By
On the Role of Attention Heads in Large Language Model Safety
17 October 2024
Z. Zhou
Haiyang Yu
Xinghua Zhang
Rongwu Xu
Fei Huang
Kun Wang
Yang Liu
Junfeng Fang
Yongbin Li
Re-assign community
ArXiv
PDF
HTML
Papers citing
"On the Role of Attention Heads in Large Language Model Safety"
4 / 4 papers shown
Title
ThinkEdit: Interpretable Weight Editing to Mitigate Overly Short Thinking in Reasoning Models
Chung-En Sun
Ge Yan
Tsui-Wei Weng
KELM
LRM
55
0
0
27 Mar 2025
Using Mechanistic Interpretability to Craft Adversarial Attacks against Large Language Models
Thomas Winninger
Boussad Addad
Katarzyna Kapusta
AAML
59
0
0
08 Mar 2025
Understanding and Rectifying Safety Perception Distortion in VLMs
Xiaohan Zou
Jian Kang
George Kesidis
Lu Lin
87
0
0
18 Feb 2025
Reinforced Lifelong Editing for Language Models
Zherui Li
Houcheng Jiang
Hao Chen
Baolong Bi
Z. Zhou
Fei Sun
Junfeng Fang
X. Wang
KELM
45
5
0
09 Feb 2025
1