ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1712.08650
  4. Cited By
A short variational proof of equivalence between policy gradients and
  soft Q learning

A short variational proof of equivalence between policy gradients and soft Q learning

22 December 2017
Pierre Harvey Richemond
B. Maginnis
ArXivPDFHTML

Papers citing "A short variational proof of equivalence between policy gradients and soft Q learning"

1 / 1 papers shown
Title
Offline Regularised Reinforcement Learning for Large Language Models
  Alignment
Offline Regularised Reinforcement Learning for Large Language Models Alignment
Pierre Harvey Richemond
Yunhao Tang
Daniel Guo
Daniele Calandriello
M. G. Azar
...
Gil Shamir
Rishabh Joshi
Tianqi Liu
Rémi Munos
Bilal Piot
OffRL
46
22
0
29 May 2024
1