Visualising Policy-Reward Interplay to Inform Zeroth-Order Preference Optimisation of Large Language Models

v1v2 (latest)

Visualising Policy-Reward Interplay to Inform Zeroth-Order Preference Optimisation of Large Language Models

Annual Meeting of the Association for Computational Linguistics (ACL), 2025

5 March 2025

Alessio Galatolo

Meriem Beloucif

ArXiv (abs)PDF HTML Github

Papers citing "Visualising Policy-Reward Interplay to Inform Zeroth-Order Preference Optimisation of Large Language Models"

0 / 0 papers shown

No papers found