Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2503.10093
Cited By
Representation-based Reward Modeling for Efficient Safety Alignment of Large Language Model
13 March 2025
Qiyuan Deng
X. Bai
Kehai Chen
Yaowei Wang
Liqiang Nie
Min Zhang
OffRL
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Representation-based Reward Modeling for Efficient Safety Alignment of Large Language Model"
Title
No papers