191
v1v2 (latest)

Self-Steering Optimization: Autonomous Preference Optimization for Large Language Models

Annual Meeting of the Association for Computational Linguistics (ACL), 2024
Main:8 Pages
4 Figures
Bibliography:3 Pages
10 Tables
Appendix:2 Pages
Abstract

The key to effective alignment lies in high-quality preference data. Recent research has focused on automated alignment, which involves developing alignment systems with minimal human intervention. However, prior research has predominantly focused on developing data generation methods, while insufficient attention has been paid to quality control mechanisms, which often produce inaccurate and unhelpful data, leading to unpredictable benefits during iterative optimization. In this paper, we present Self-Steering Optimization (SSOSSO), an algorithm that autonomously generates high-quality preference data, eliminating manual annotation requirements. SSOSSO employs a specialized optimization objective to build a data generator from the policy model itself, which is used to produce accurate and on-policy data. We demonstrate SSOSSO's effectiveness through comprehensive experiments on two series of models: Llama 3 and Qwen 2. Our evaluation across diverse benchmarks shows that SSOSSO consistently outperforms baselines in human preference alignment and reward optimization. Further analysis validates SSOSSO as a scalable framework for preference optimization, benefiting the advancement in automated alignment techniques.

View on arXiv
Comments on this paper