RLSF: Reinforcement Learning via Symbolic Feedback
- LRM
Main:6 Pages
5 Figures
Bibliography:2 Pages
4 Tables
Appendix:3 Pages
Abstract
Reinforcement Learning with Human Feedback (RLHF) is considered a standard approach to fine-tuning Large Language Models (LLMs). However, such methods often face limitations such as unsound black-box reward models, difficulties in collecting human preference data, and the reliance on sparse scalar rewards. These methods often fall short when applied to tasks that require complex domain-specific understanding.
View on arXivComments on this paper
