ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2106.07329
  4. Cited By
On-Policy Deep Reinforcement Learning for the Average-Reward Criterion

On-Policy Deep Reinforcement Learning for the Average-Reward Criterion

14 June 2021
Yiming Zhang
Keith Ross
    OffRL
ArXivPDFHTML

Papers citing "On-Policy Deep Reinforcement Learning for the Average-Reward Criterion"

21 / 21 papers shown
Title
Score as Action: Fine-Tuning Diffusion Generative Models by Continuous-time Reinforcement Learning
Score as Action: Fine-Tuning Diffusion Generative Models by Continuous-time Reinforcement Learning
Hanyang Zhao
Haoxian Chen
Ji Zhang
D. Yao
Wenpin Tang
60
0
0
03 Feb 2025
Average-Reward Reinforcement Learning with Entropy Regularization
Average-Reward Reinforcement Learning with Entropy Regularization
Jacob Adamczyk
Volodymyr Makarenko
Stas Tiomkin
R. Kulkarni
OOD
55
2
0
17 Jan 2025
An Empirical Study of Deep Reinforcement Learning in Continuing Tasks
An Empirical Study of Deep Reinforcement Learning in Continuing Tasks
Yi Wan
D. Korenkevych
Zheqing Zhu
OffRL
CLL
45
0
0
12 Jan 2025
RVI-SAC: Average Reward Off-Policy Deep Reinforcement Learning
RVI-SAC: Average Reward Off-Policy Deep Reinforcement Learning
Yukinari Hisaki
Isao Ono
19
2
0
04 Aug 2024
NeoRL: Efficient Exploration for Nonepisodic RL
NeoRL: Efficient Exploration for Nonepisodic RL
Bhavya Sukhija
Lenart Treven
Florian Dorfler
Stelian Coros
Andreas Krause
OffRL
30
0
0
03 Jun 2024
OMPO: A Unified Framework for RL under Policy and Dynamics Shifts
OMPO: A Unified Framework for RL under Policy and Dynamics Shifts
Yu-Juan Luo
Tianying Ji
Gang Hua
Jianwei Zhang
Huazhe Xu
Xianyuan Zhan
OffRL
64
3
0
29 May 2024
Intervention-Assisted Policy Gradient Methods for Online Stochastic
  Queuing Network Optimization: Technical Report
Intervention-Assisted Policy Gradient Methods for Online Stochastic Queuing Network Optimization: Technical Report
Jerrod Wigmore
B. Shrader
E. Modiano
OffRL
26
1
0
05 Apr 2024
Towards Global Optimality for Practical Average Reward Reinforcement
  Learning without Mixing Time Oracles
Towards Global Optimality for Practical Average Reward Reinforcement Learning without Mixing Time Oracles
Bhrij Patel
Wesley A Suttle
Alec Koppel
Vaneet Aggarwal
Brian M Sadler
Amrit Singh Bedi
Dinesh Manocha
32
1
0
18 Mar 2024
Learning to Stabilize Online Reinforcement Learning in Unbounded State
  Spaces
Learning to Stabilize Online Reinforcement Learning in Unbounded State Spaces
Brahma S. Pavse
M. Zurek
Yudong Chen
Qiaomin Xie
Josiah P. Hanna
OffRL
33
1
0
02 Jun 2023
Policy Optimization for Continuous Reinforcement Learning
Policy Optimization for Continuous Reinforcement Learning
Hanyang Zhao
Wenpin Tang
D. Yao
OffRL
32
17
0
30 May 2023
Model-Free Robust Average-Reward Reinforcement Learning
Model-Free Robust Average-Reward Reinforcement Learning
Yue Wang
Alvaro Velasquez
George K. Atia
Ashley Prater-Bennette
Shaofeng Zou
32
9
0
17 May 2023
Robust Average-Reward Markov Decision Processes
Robust Average-Reward Markov Decision Processes
Yue Wang
Alvaro Velasquez
George K. Atia
Ashley Prater-Bennette
Shaofeng Zou
33
11
0
02 Jan 2023
Adaptive patch foraging in deep reinforcement learning agents
Adaptive patch foraging in deep reinforcement learning agents
Nathan J. Wispinski
Andrew Butcher
K. Mathewson
Craig S. Chapman
M. Botvinick
P. Pilarski
16
8
0
14 Oct 2022
Deriving time-averaged active inference from control principles
Deriving time-averaged active inference from control principles
Eli Sennesh
J. Theriault
Jan-Willem van de Meent
L. F. Barrett
K. Quigley
AI4TS
AI4CE
24
3
0
22 Aug 2022
Improving Sample Efficiency in Evolutionary RL Using Off-Policy Ranking
Improving Sample Efficiency in Evolutionary RL Using Off-Policy Ranking
R. EshwarS
Shishir Kolathaya
Gugan Thoppe
14
0
0
22 Aug 2022
Mean-Semivariance Policy Optimization via Risk-Averse Reinforcement
  Learning
Mean-Semivariance Policy Optimization via Risk-Averse Reinforcement Learning
Xiaoteng Ma
Shuai Ma
Li Xia
Qianchuan Zhao
11
3
0
15 Jun 2022
Transportation-Inequalities, Lyapunov Stability and Sampling for
  Dynamical Systems on Continuous State Space
Transportation-Inequalities, Lyapunov Stability and Sampling for Dynamical Systems on Continuous State Space
Muhammad Naeem
Miroslav Pajic
14
3
0
25 May 2022
Stochastic first-order methods for average-reward Markov decision
  processes
Stochastic first-order methods for average-reward Markov decision processes
Tianjiao Li
Feiyang Wu
Guanghui Lan
22
13
0
11 May 2022
Refined Policy Improvement Bounds for MDPs
Refined Policy Improvement Bounds for MDPs
J. Dai
Mark O. Gluzman
14
3
0
16 Jul 2021
Average-Reward Reinforcement Learning with Trust Region Methods
Average-Reward Reinforcement Learning with Trust Region Methods
Xiaoteng Ma
Xiao-Jing Tang
Li Xia
Jun Yang
Qianchuan Zhao
16
16
0
07 Jun 2021
Model-free Reinforcement Learning in Infinite-horizon Average-reward
  Markov Decision Processes
Model-free Reinforcement Learning in Infinite-horizon Average-reward Markov Decision Processes
Chen-Yu Wei
Mehdi Jafarnia-Jahromi
Haipeng Luo
Hiteshi Sharma
R. Jain
107
99
0
15 Oct 2019
1