Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1712.08650
Cited By
A short variational proof of equivalence between policy gradients and soft Q learning
22 December 2017
Pierre Harvey Richemond
B. Maginnis
Re-assign community
ArXiv
PDF
HTML
Papers citing
"A short variational proof of equivalence between policy gradients and soft Q learning"
1 / 1 papers shown
Title
Offline Regularised Reinforcement Learning for Large Language Models Alignment
Pierre Harvey Richemond
Yunhao Tang
Daniel Guo
Daniele Calandriello
M. G. Azar
...
Gil Shamir
Rishabh Joshi
Tianqi Liu
Rémi Munos
Bilal Piot
OffRL
46
22
0
29 May 2024
1