RLSF: Reinforcement Learning via Symbolic Feedback

26 May 2024

Piyush Jha

Main:6 Pages

5 Figures

Bibliography:2 Pages

4 Tables

Appendix:3 Pages

Abstract

Reinforcement Learning with Human Feedback (RLHF) is considered a standard approach to fine-tuning Large Language Models (LLMs). However, such methods often face limitations such as unsound black-box reward models, difficulties in collecting human preference data, and the reliance on sparse scalar rewards. These methods often fall short when applied to tasks that require complex domain-specific understanding.

View on arXiv

Comments on this paper