Improving Policy Gradient by Exploring Under-appreciated Rewards

28 November 2016

Papers citing "Improving Policy Gradient by Exploring Under-appreciated Rewards"

9 / 9 papers shown

Title
Policy Gradient Algorithms Implicitly Optimize by Continuation Adrien Bolland Gilles Louppe D. Ernst 36 3 0 11 May 2023
Learning to Reach Goals via Iterated Supervised Learning Dibya Ghosh Abhishek Gupta Ashwin Reddy Justin Fu Coline Devin Benjamin Eysenbach Sergey Levine 24 33 0 12 Dec 2019
Countering the Effects of Lead Bias in News Summarization via Multi-Stage Training and Auxiliary Losses Matt Grenander Yue Dong Jackie C.K. Cheung Annie Louis 14 35 0 08 Sep 2019
Global Optimality Guarantees For Policy Gradient Methods Jalaj Bhandari Daniel Russo 35 185 0 05 Jun 2019
Efficient Entropy for Policy Gradient with Multidimensional Action Space Yiming Zhang Q. Vuong Kenny Song Xiao-Yue Gong Keith Ross 25 16 0 02 Jun 2018
Neural Architecture Search using Deep Neural Networks and Monte Carlo Tree Search Linnan Wang Yiyang Zhao Yuu Jinnai Yuandong Tian Rodrigo Fonseca BDL 23 50 0 18 May 2018
From Language to Programs: Bridging Reinforcement Learning and Maximum Marginal Likelihood Kelvin Guu Panupong Pasupat E. Liu Percy Liang 34 190 0 25 Apr 2017
Deep Reinforcement Learning: An Overview Yuxi Li OffRL VLM 104 1,502 0 25 Jan 2017
An Alternative Softmax Operator for Reinforcement Learning Kavosh Asadi Michael L. Littman 20 10 0 16 Dec 2016