Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2410.10014
Cited By
Safety-Aware Fine-Tuning of Large Language Models
13 October 2024
Hyeong Kyu Choi
Xuefeng Du
Yixuan Li
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Safety-Aware Fine-Tuning of Large Language Models"
2 / 2 papers shown
Title
Emergent Misalignment: Narrow finetuning can produce broadly misaligned LLMs
Jan Betley
Daniel Tan
Niels Warncke
Anna Sztyber-Betley
Xuchan Bao
Martín Soto
Nathan Labenz
Owain Evans
AAML
73
8
0
24 Feb 2025
Process Reward Model with Q-Value Rankings
W. Li
Yixuan Li
LRM
39
13
0
15 Oct 2024
1