v1v2v3 (latest)

Beyond variance reduction: Understanding the true impact of baselines on policy optimization

International Conference on Machine Learning (ICML), 2024

31 August 2020

Nicolas Le Roux

Papers citing "Beyond variance reduction: Understanding the true impact of baselines on policy optimization"

20 / 20 papers shown

Title
Multi-level Advantage Credit Assignment for Cooperative Multi-Agent Reinforcement Learning Xutong Zhao Yaqi Xie 32 0 0 09 Aug 2025
TreeRL: LLM Reinforcement Learning with On-Policy Tree Search Zhenyu Hou Ziniu Hu Yujiang Li Rui Lu Jie Tang Yuxiao Dong OffRL LRM 95 11 0 13 Jun 2025
Breaking Habits: On the Role of the Advantage Function in Learning Causal State Representations Miguel Suau CML 140 0 0 13 Jun 2025
Tapered Off-Policy REINFORCE: Stable and efficient reinforcement learning for LLMs Nicolas Le Roux Marc G. Bellemare Jonathan Lebensold Arnaud Bergeron Joshua Greaves Alex Fréchette Carolyne Pelletier Eric Thibodeau-Laufer Sándor Toth Sam Work OffRL 373 22 0 18 Mar 2025
Stochastic Gradient Succeeds for Bandits Jincheng Mei Zixin Zhong Bo Dai Alekh Agarwal Csaba Szepesvári Dale Schuurmans 140 1 0 27 Feb 2024
Behind the Myth of Exploration in Policy Gradients Adrien Bolland Gaspard Lambrechts Damien Ernst 200 1 0 31 Jan 2024
Target-independent XLA optimization using Reinforcement Learning Milan Ganai Haichen Li Theodore Enns Yida Wang Randy Huang 109 1 0 28 Aug 2023
The Role of Baselines in Policy Gradient Optimization Jincheng Mei Wesley Chung Valentin Thomas Bo Dai Csaba Szepesvári Dale Schuurmans 118 21 0 16 Jan 2023
Variance Reduction for Score Functions Using Optimal Baselines Ronan L. Keane H. Gao 72 0 0 27 Dec 2022
Coordinate Ascent for Off-Policy RL with Global Convergence Guarantees Hsin-En Su Yen-Ju Chen Ping-Chun Hsieh Xi Liu OffRL 107 0 0 10 Dec 2022
When Bioprocess Engineering Meets Machine Learning: A Survey from the Perspective of Automated Bioprocess Development Nghia Duong-Trung Stefan Born Jong Woo Kim M. Schermeyer Katharina Paulick ... Thorben Werner Randolf Scholz Lars Schmidt-Thieme Peter Neubauer Ernesto Martinez 136 23 0 02 Sep 2022
Momentum-Based Policy Gradient with Second-Order Information Saber Salehkaleybar Sadegh Khorasani Negar Kiyavash Niao He Patrick Thiran 115 9 0 17 May 2022
A Reinforcement Learning Approach to Domain-Knowledge Inclusion Using Grammar Guided Symbolic Regression Laure Crochepierre Lydia Boudjeloud Vincent Barbesant 64 4 0 09 Feb 2022
PAGE-PG: A Simple and Loopless Variance-Reduced Policy Gradient Method with Probabilistic Gradient Estimation Matilde Gargiani Andrea Zanelli Andrea Martinelli Tyler H. Summers John Lygeros 90 15 0 01 Feb 2022
An Alternate Policy Gradient Estimator for Softmax Policies Shivam Garg Samuele Tosatto Yangchen Pan Martha White A. R. Mahmood 171 7 0 22 Dec 2021
Understanding the Effect of Stochasticity in Policy Optimization Jincheng Mei Bo Dai Chenjun Xiao Csaba Szepesvári Dale Schuurmans 127 19 0 29 Oct 2021
Beyond Exact Gradients: Convergence of Stochastic Soft-Max Policy Gradient Methods with Entropy Regularization Yuhao Ding Junzi Zhang Hyunin Lee Javad Lavaei 235 19 0 19 Oct 2021
Coordinate-wise Control Variates for Deep Policy Gradients Yuanyi Zhong Yuanshuo Zhou Jian-wei Peng BDL 121 1 0 11 Jul 2021
Knowledge Infused Policy Gradients with Upper Confidence Bound for Relational Bandits Kaushik Roy Tao Gui Manas Gaur A. Sheth OffRL 94 15 0 25 Jun 2021
On Proximal Policy Optimization's Heavy-tailed Gradients Saurabh Garg Joshua Zhanson Emilio Parisotto Adarsh Prasad J. Zico Kolter Zachary Chase Lipton Sivaraman Balakrishnan Ruslan Salakhutdinov Pradeep Ravikumar 116 18 0 20 Feb 2021