ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1909.12238
  4. Cited By
V-MPO: On-Policy Maximum a Posteriori Policy Optimization for Discrete
  and Continuous Control

V-MPO: On-Policy Maximum a Posteriori Policy Optimization for Discrete and Continuous Control

26 September 2019
H. F. Song
A. Abdolmaleki
Jost Tobias Springenberg
Aidan Clark
Hubert Soyer
Jack W. Rae
Seb Noury
Arun Ahuja
Siqi Liu
Dhruva Tirumala
N. Heess
Dan Belov
Martin Riedmiller
M. Botvinick
ArXivPDFHTML

Papers citing "V-MPO: On-Policy Maximum a Posteriori Policy Optimization for Discrete and Continuous Control"

38 / 88 papers shown
Title
Learning Coordinated Terrain-Adaptive Locomotion by Imitating a
  Centroidal Dynamics Planner
Learning Coordinated Terrain-Adaptive Locomotion by Imitating a Centroidal Dynamics Planner
Philemon Brakel
Steven Bohez
Leonard Hasenclever
N. Heess
Konstantinos Bousmalis
13
16
0
30 Oct 2021
Statistical discrimination in learning agents
Statistical discrimination in learning agents
Edgar A. Duénez-Guzmán
Kevin R. McKee
Yiran Mao
Ben Coppin
Silvia Chiappa
...
Yoram Bachrach
Suzanne Sadedin
William S. Isaac
K. Tuyls
Joel Z Leibo
39
7
0
21 Oct 2021
Collaborating with Humans without Human Data
Collaborating with Humans without Human Data
D. Strouse
Kevin R. McKee
M. Botvinick
Edward Hughes
Richard Everett
124
161
0
15 Oct 2021
Faster Improvement Rate Population Based Training
Faster Improvement Rate Population Based Training
Valentin Dalibard
Max Jaderberg
17
10
0
28 Sep 2021
Open-Ended Learning Leads to Generally Capable Agents
Open-Ended Learning Leads to Generally Capable Agents
Open-Ended Learning Team
Adam Stooke
Anuj Mahajan
Catarina Barros
Charlie Deck
...
Nicolas Porcel
Roberta Raileanu
Steph Hughes-Fitt
Valentin Dalibard
Wojciech M. Czarnecki
26
181
0
27 Jul 2021
Scalable Evaluation of Multi-Agent Reinforcement Learning with Melting
  Pot
Scalable Evaluation of Multi-Agent Reinforcement Learning with Melting Pot
Joel Z Leibo
Edgar A. Duénez-Guzmán
A. Vezhnevets
J. Agapiou
P. Sunehag
Raphael Köster
Jayd Matyas
Charlie Beattie
Igor Mordatch
T. Graepel
OffRL
58
103
0
14 Jul 2021
CoBERL: Contrastive BERT for Reinforcement Learning
CoBERL: Contrastive BERT for Reinforcement Learning
Andrea Banino
Adria Puidomenech Badia
Jacob Walker
Tim Scholtes
Jovana Mitrović
Charles Blundell
OffRL
30
36
0
12 Jul 2021
Using Probabilistic Movement Primitives in Analyzing Human Motion
  Difference under Transcranial Current Stimulation
Using Probabilistic Movement Primitives in Analyzing Human Motion Difference under Transcranial Current Stimulation
Honghu Xue
Rebecca Herzog
Till M. Berger
T. Bäumer
A. Weissbach
Elmar Rueckert
12
5
0
05 Jul 2021
On-Policy Deep Reinforcement Learning for the Average-Reward Criterion
On-Policy Deep Reinforcement Learning for the Average-Reward Criterion
Yiming Zhang
Keith Ross
OffRL
25
39
0
14 Jun 2021
An Entropy Regularization Free Mechanism for Policy-based Reinforcement
  Learning
An Entropy Regularization Free Mechanism for Policy-based Reinforcement Learning
Changnan Xiao
Haosen Shi
Jiajun Fan
Shihong Deng
18
5
0
01 Jun 2021
Learning and Planning in Complex Action Spaces
Learning and Planning in Complex Action Spaces
Thomas Hubert
Julian Schrittwieser
Ioannis Antonoglou
M. Barekatain
Simon Schmitt
David Silver
19
76
0
13 Apr 2021
Efficient Transformers in Reinforcement Learning using Actor-Learner
  Distillation
Efficient Transformers in Reinforcement Learning using Actor-Learner Distillation
Emilio Parisotto
Ruslan Salakhutdinov
42
44
0
04 Apr 2021
Co-Adaptation of Algorithmic and Implementational Innovations in
  Inference-based Deep Reinforcement Learning
Co-Adaptation of Algorithmic and Implementational Innovations in Inference-based Deep Reinforcement Learning
Hiroki Furuta
Tadashi Kozuno
T. Matsushima
Y. Matsuo
S. Gu
15
14
0
31 Mar 2021
Improved Regret Bound and Experience Replay in Regularized Policy
  Iteration
Improved Regret Bound and Experience Replay in Regularized Policy Iteration
N. Lazić
Dong Yin
Yasin Abbasi-Yadkori
Csaba Szepesvári
OffRL
6
17
0
25 Feb 2021
Quantifying the effects of environment and population diversity in
  multi-agent reinforcement learning
Quantifying the effects of environment and population diversity in multi-agent reinforcement learning
Kevin R. McKee
Joel Z Leibo
Charlie Beattie
Richard Everett
44
31
0
16 Feb 2021
Optimization Issues in KL-Constrained Approximate Policy Iteration
Optimization Issues in KL-Constrained Approximate Policy Iteration
N. Lazić
Botao Hao
Yasin Abbasi-Yadkori
Dale Schuurmans
Csaba Szepesvári
19
10
0
11 Feb 2021
Simple Agent, Complex Environment: Efficient Reinforcement Learning with
  Agent States
Simple Agent, Complex Environment: Efficient Reinforcement Learning with Agent States
Shi Dong
Benjamin Van Roy
Zhengyuan Zhou
16
29
0
10 Feb 2021
Alchemy: A benchmark and analysis toolkit for meta-reinforcement
  learning agents
Alchemy: A benchmark and analysis toolkit for meta-reinforcement learning agents
Jane X. Wang
Michael King
Nicolas Porcel
Z. Kurth-Nelson
Tina Zhu
...
Neil C. Rabinowitz
Loic Matthey
Demis Hassabis
Alexander Lerchner
M. Botvinick
OffRL
23
30
0
04 Feb 2021
Differentiable Trust Region Layers for Deep Reinforcement Learning
Differentiable Trust Region Layers for Deep Reinforcement Learning
Fabian Otto
P. Becker
Ngo Anh Vien
Hanna Ziesche
Gerhard Neumann
OffRL
30
19
0
22 Jan 2021
Stabilizing Transformer-Based Action Sequence Generation For Q-Learning
Stabilizing Transformer-Based Action Sequence Generation For Q-Learning
Gideon Stein
Andrey Filchenkov
Arip Asadulaev
OffRL
21
2
0
23 Oct 2020
Logistic Q-Learning
Logistic Q-Learning
Joan Bas-Serrano
Sebastian Curi
Andreas Krause
Gergely Neu
9
40
0
21 Oct 2020
Local Search for Policy Iteration in Continuous Control
Local Search for Policy Iteration in Continuous Control
Jost Tobias Springenberg
N. Heess
D. Mankowitz
J. Merel
Arunkumar Byravan
...
Julian Schrittwieser
Yuval Tassa
J. Buchli
Dan Belov
Martin Riedmiller
OffRL
14
15
0
12 Oct 2020
My Body is a Cage: the Role of Morphology in Graph-Based Incompatible
  Control
My Body is a Cage: the Role of Morphology in Graph-Based Incompatible Control
Vitaly Kurin
Maximilian Igl
Tim Rocktaschel
Wendelin Boehmer
Shimon Whiteson
AI4CE
13
85
0
05 Oct 2020
Revisiting Design Choices in Proximal Policy Optimization
Revisiting Design Choices in Proximal Policy Optimization
Chloe Ching-Yun Hsu
Celestine Mendler-Dünner
Moritz Hardt
9
53
0
23 Sep 2020
Physically Embedded Planning Problems: New Challenges for Reinforcement
  Learning
Physically Embedded Planning Problems: New Challenges for Reinforcement Learning
M. Berk Mirza
Andrew Jaegle
Jonathan J. Hunt
A. Guez
S. Tunyasuvunakool
...
Peter Karkus
S. Racanière
Lars Buesing
Timothy Lillicrap
N. Heess
AI4CE
23
12
0
11 Sep 2020
Phasic Policy Gradient
Phasic Policy Gradient
K. Cobbe
Jacob Hilton
Oleg Klimov
John Schulman
OffRL
7
152
0
09 Sep 2020
Monte-Carlo Tree Search as Regularized Policy Optimization
Monte-Carlo Tree Search as Regularized Policy Optimization
Jean-Bastien Grill
Florent Altché
Yunhao Tang
Thomas Hubert
Michal Valko
Ioannis Antonoglou
Rémi Munos
19
73
0
24 Jul 2020
Off-Dynamics Reinforcement Learning: Training for Transfer with Domain
  Classifiers
Off-Dynamics Reinforcement Learning: Training for Transfer with Domain Classifiers
Benjamin Eysenbach
Swapnil Asawa
Shreyas Chaudhari
Sergey Levine
Ruslan Salakhutdinov
24
86
0
24 Jun 2020
RL Unplugged: A Suite of Benchmarks for Offline Reinforcement Learning
RL Unplugged: A Suite of Benchmarks for Offline Reinforcement Learning
Çağlar Gülçehre
Ziyun Wang
Alexander Novikov
T. Paine
Sergio Gomez Colmenarejo
...
Matthew W. Hoffman
Ofir Nachum
George Tucker
N. Heess
Nando de Freitas
OffRL
27
71
0
24 Jun 2020
Hindsight Expectation Maximization for Goal-conditioned Reinforcement
  Learning
Hindsight Expectation Maximization for Goal-conditioned Reinforcement Learning
Yunhao Tang
A. Kucukelbir
OffRL
14
16
0
13 Jun 2020
Zeroth-Order Supervised Policy Improvement
Zeroth-Order Supervised Policy Improvement
Hao Sun
Ziping Xu
Yuhang Song
Meng Fang
Jiechao Xiong
Bo Dai
Bolei Zhou
OffRL
6
9
0
11 Jun 2020
What Matters In On-Policy Reinforcement Learning? A Large-Scale
  Empirical Study
What Matters In On-Policy Reinforcement Learning? A Large-Scale Empirical Study
Marcin Andrychowicz
Anton Raichuk
Piotr Stańczyk
Manu Orsini
Sertan Girgin
...
M. Geist
Olivier Pietquin
Marcin Michalski
Sylvain Gelly
Olivier Bachem
OffRL
31
213
0
10 Jun 2020
A Distributional View on Multi-Objective Policy Optimization
A Distributional View on Multi-Objective Policy Optimization
A. Abdolmaleki
Sandy H. Huang
Leonard Hasenclever
Michael Neunert
H. F. Song
Martina Zambelli
M. Martins
N. Heess
R. Hadsell
Martin Riedmiller
21
74
0
15 May 2020
Bootstrap Latent-Predictive Representations for Multitask Reinforcement
  Learning
Bootstrap Latent-Predictive Representations for Multitask Reinforcement Learning
Z. Guo
Bernardo Avila-Pires
Bilal Piot
Jean-Bastien Grill
Florent Altché
Rémi Munos
M. G. Azar
BDL
DRL
SSL
43
139
0
30 Apr 2020
Taylor Expansion Policy Optimization
Taylor Expansion Policy Optimization
Yunhao Tang
Michal Valko
Rémi Munos
OffRL
12
14
0
13 Mar 2020
Reinforcement Learning via Fenchel-Rockafellar Duality
Reinforcement Learning via Fenchel-Rockafellar Duality
Ofir Nachum
Bo Dai
OffRL
11
117
0
07 Jan 2020
Catch & Carry: Reusable Neural Controllers for Vision-Guided Whole-Body
  Tasks
Catch & Carry: Reusable Neural Controllers for Vision-Guided Whole-Body Tasks
J. Merel
S. Tunyasuvunakool
Arun Ahuja
Yuval Tassa
Leonard Hasenclever
Vu Pham
Tom Erez
Greg Wayne
N. Heess
23
9
0
15 Nov 2019
Stabilizing Transformers for Reinforcement Learning
Stabilizing Transformers for Reinforcement Learning
Emilio Parisotto
H. F. Song
Jack W. Rae
Razvan Pascanu
Çağlar Gülçehre
...
Aidan Clark
Seb Noury
M. Botvinick
N. Heess
R. Hadsell
OffRL
22
360
0
13 Oct 2019
Previous
12