AlpacaFarm: A Simulation Framework for Methods that Learn from Human
FeedbackNeural Information Processing Systems (NeurIPS), 2023 Yann Dubois Xuechen Li Rohan Taori Tianyi Zhang Ishaan Gulrajani Jimmy Ba Carlos Guestrin Abigail Z. Jacobs Tatsunori B. Hashimoto |
Survey on reinforcement learning for language processingArtificial Intelligence Review (AIR), 2021 |
Stochastic Structured Prediction under Bandit FeedbackNeural Information Processing Systems (NeurIPS), 2016 |