Towards Reward Fairness in RLHF: From a Resource Allocation PerspectiveAnnual Meeting of the Association for Computational Linguistics (ACL), 2025 |
MPO: Multilingual Safety Alignment via Reward Gap OptimizationAnnual Meeting of the Association for Computational Linguistics (ACL), 2025 |