Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2205.09123
Cited By
A2C is a special case of PPO
18 May 2022
Shengyi Huang
Anssi Kanervisto
Antonin Raffin
Weixun Wang
Santiago Ontañón
Rousslan Fernand Julien Dossa
OffRL
Re-assign community
ArXiv
PDF
HTML
Papers citing
"A2C is a special case of PPO"
2 / 2 papers shown
Title
Asynchronous RLHF: Faster and More Efficient Off-Policy RL for Language Models
Michael Noukhovitch
Shengyi Huang
Sophie Xhonneux
Arian Hosseini
Rishabh Agarwal
Aaron C. Courville
OffRL
77
4
0
23 Oct 2024
Target-independent XLA optimization using Reinforcement Learning
Milan Ganai
Haichen Li
Theodore Enns
Yida Wang
Randy Huang
21
0
0
28 Aug 2023
1