413

RLSF: Reinforcement Learning via Symbolic Feedback

Main:6 Pages
5 Figures
Bibliography:2 Pages
4 Tables
Appendix:3 Pages
Abstract

Reinforcement Learning with Human Feedback (RLHF) is considered a standard approach to fine-tuning Large Language Models (LLMs). However, such methods often face limitations such as unsound black-box reward models, difficulties in collecting human preference data, and the reliance on sparse scalar rewards. These methods often fall short when applied to tasks that require complex domain-specific understanding.

View on arXiv
Comments on this paper