Equilibrate RLHF: Towards Balancing Helpfulness-Safety Trade-off in Large Language Models

Equilibrate RLHF: Towards Balancing Helpfulness-Safety Trade-off in Large Language Models

17 February 2025

Papers citing "Equilibrate RLHF: Towards Balancing Helpfulness-Safety Trade-off in Large Language Models"

Title
No papers