Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2507.00665
Cited By
SAFER: Probing Safety in Reward Models with Sparse Autoencoder
1 July 2025
Sihang Li
Wei Shi
Ziyuan Xie
Tao Liang
Guojun Ma
Xiang Wang
OffRL
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"SAFER: Probing Safety in Reward Models with Sparse Autoencoder"
Title
No papers