Asynchronous RLHF: Faster and More Efficient Off-Policy RL for Language Models

Asynchronous RLHF: Faster and More Efficient Off-Policy RL for Language Models

Papers citing "Asynchronous RLHF: Faster and More Efficient Off-Policy RL for Language Models"