ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2205.09123
  4. Cited By
A2C is a special case of PPO

A2C is a special case of PPO

18 May 2022
Shengyi Huang
Anssi Kanervisto
Antonin Raffin
Weixun Wang
Santiago Ontañón
Rousslan Fernand Julien Dossa
    OffRL
ArXivPDFHTML

Papers citing "A2C is a special case of PPO"

2 / 2 papers shown
Title
Asynchronous RLHF: Faster and More Efficient Off-Policy RL for Language Models
Asynchronous RLHF: Faster and More Efficient Off-Policy RL for Language Models
Michael Noukhovitch
Shengyi Huang
Sophie Xhonneux
Arian Hosseini
Rishabh Agarwal
Aaron C. Courville
OffRL
77
4
0
23 Oct 2024
Target-independent XLA optimization using Reinforcement Learning
Target-independent XLA optimization using Reinforcement Learning
Milan Ganai
Haichen Li
Theodore Enns
Yida Wang
Randy Huang
21
0
0
28 Aug 2023
1