ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2008.13773
  4. Cited By
Beyond variance reduction: Understanding the true impact of baselines on
  policy optimization
v1v2v3 (latest)

Beyond variance reduction: Understanding the true impact of baselines on policy optimization

International Conference on Machine Learning (ICML), 2024
31 August 2020
Wesley Chung
Valentin Thomas
Marlos C. Machado
Nicolas Le Roux
    OffRL
ArXiv (abs)PDFHTML

Papers citing "Beyond variance reduction: Understanding the true impact of baselines on policy optimization"

20 / 20 papers shown
Title
Multi-level Advantage Credit Assignment for Cooperative Multi-Agent Reinforcement Learning
Multi-level Advantage Credit Assignment for Cooperative Multi-Agent Reinforcement Learning
Xutong Zhao
Yaqi Xie
32
0
0
09 Aug 2025
TreeRL: LLM Reinforcement Learning with On-Policy Tree Search
TreeRL: LLM Reinforcement Learning with On-Policy Tree Search
Zhenyu Hou
Ziniu Hu
Yujiang Li
Rui Lu
Jie Tang
Yuxiao Dong
OffRLLRM
95
11
0
13 Jun 2025
Breaking Habits: On the Role of the Advantage Function in Learning Causal State Representations
Breaking Habits: On the Role of the Advantage Function in Learning Causal State Representations
Miguel Suau
CML
140
0
0
13 Jun 2025
Tapered Off-Policy REINFORCE: Stable and efficient reinforcement learning for LLMs
Tapered Off-Policy REINFORCE: Stable and efficient reinforcement learning for LLMs
Nicolas Le Roux
Marc G. Bellemare
Jonathan Lebensold
Arnaud Bergeron
Joshua Greaves
Alex Fréchette
Carolyne Pelletier
Eric Thibodeau-Laufer
Sándor Toth
Sam Work
OffRL
373
22
0
18 Mar 2025
Stochastic Gradient Succeeds for Bandits
Stochastic Gradient Succeeds for Bandits
Jincheng Mei
Zixin Zhong
Bo Dai
Alekh Agarwal
Csaba Szepesvári
Dale Schuurmans
140
1
0
27 Feb 2024
Behind the Myth of Exploration in Policy Gradients
Behind the Myth of Exploration in Policy Gradients
Adrien Bolland
Gaspard Lambrechts
Damien Ernst
200
1
0
31 Jan 2024
Target-independent XLA optimization using Reinforcement Learning
Target-independent XLA optimization using Reinforcement Learning
Milan Ganai
Haichen Li
Theodore Enns
Yida Wang
Randy Huang
109
1
0
28 Aug 2023
The Role of Baselines in Policy Gradient Optimization
The Role of Baselines in Policy Gradient Optimization
Jincheng Mei
Wesley Chung
Valentin Thomas
Bo Dai
Csaba Szepesvári
Dale Schuurmans
118
21
0
16 Jan 2023
Variance Reduction for Score Functions Using Optimal Baselines
Variance Reduction for Score Functions Using Optimal Baselines
Ronan L. Keane
H. Gao
72
0
0
27 Dec 2022
Coordinate Ascent for Off-Policy RL with Global Convergence Guarantees
Coordinate Ascent for Off-Policy RL with Global Convergence Guarantees
Hsin-En Su
Yen-Ju Chen
Ping-Chun Hsieh
Xi Liu
OffRL
107
0
0
10 Dec 2022
When Bioprocess Engineering Meets Machine Learning: A Survey from the
  Perspective of Automated Bioprocess Development
When Bioprocess Engineering Meets Machine Learning: A Survey from the Perspective of Automated Bioprocess Development
Nghia Duong-Trung
Stefan Born
Jong Woo Kim
M. Schermeyer
Katharina Paulick
...
Thorben Werner
Randolf Scholz
Lars Schmidt-Thieme
Peter Neubauer
Ernesto Martinez
136
23
0
02 Sep 2022
Momentum-Based Policy Gradient with Second-Order Information
Momentum-Based Policy Gradient with Second-Order Information
Saber Salehkaleybar
Sadegh Khorasani
Negar Kiyavash
Niao He
Patrick Thiran
115
9
0
17 May 2022
A Reinforcement Learning Approach to Domain-Knowledge Inclusion Using
  Grammar Guided Symbolic Regression
A Reinforcement Learning Approach to Domain-Knowledge Inclusion Using Grammar Guided Symbolic Regression
Laure Crochepierre
Lydia Boudjeloud
Vincent Barbesant
64
4
0
09 Feb 2022
PAGE-PG: A Simple and Loopless Variance-Reduced Policy Gradient Method
  with Probabilistic Gradient Estimation
PAGE-PG: A Simple and Loopless Variance-Reduced Policy Gradient Method with Probabilistic Gradient Estimation
Matilde Gargiani
Andrea Zanelli
Andrea Martinelli
Tyler H. Summers
John Lygeros
90
15
0
01 Feb 2022
An Alternate Policy Gradient Estimator for Softmax Policies
An Alternate Policy Gradient Estimator for Softmax Policies
Shivam Garg
Samuele Tosatto
Yangchen Pan
Martha White
A. R. Mahmood
171
7
0
22 Dec 2021
Understanding the Effect of Stochasticity in Policy Optimization
Understanding the Effect of Stochasticity in Policy Optimization
Jincheng Mei
Bo Dai
Chenjun Xiao
Csaba Szepesvári
Dale Schuurmans
127
19
0
29 Oct 2021
Beyond Exact Gradients: Convergence of Stochastic Soft-Max Policy
  Gradient Methods with Entropy Regularization
Beyond Exact Gradients: Convergence of Stochastic Soft-Max Policy Gradient Methods with Entropy Regularization
Yuhao Ding
Junzi Zhang
Hyunin Lee
Javad Lavaei
235
19
0
19 Oct 2021
Coordinate-wise Control Variates for Deep Policy Gradients
Coordinate-wise Control Variates for Deep Policy Gradients
Yuanyi Zhong
Yuanshuo Zhou
Jian-wei Peng
BDL
121
1
0
11 Jul 2021
Knowledge Infused Policy Gradients with Upper Confidence Bound for
  Relational Bandits
Knowledge Infused Policy Gradients with Upper Confidence Bound for Relational Bandits
Kaushik Roy
Tao Gui
Manas Gaur
A. Sheth
OffRL
94
15
0
25 Jun 2021
On Proximal Policy Optimization's Heavy-tailed Gradients
On Proximal Policy Optimization's Heavy-tailed Gradients
Saurabh Garg
Joshua Zhanson
Emilio Parisotto
Adarsh Prasad
J. Zico Kolter
Zachary Chase Lipton
Sivaraman Balakrishnan
Ruslan Salakhutdinov
Pradeep Ravikumar
116
18
0
20 Feb 2021
1