Reward Biased Maximum Likelihood Estimation for Reinforcement Learning

v1v2v3 (latest)

Reward Biased Maximum Likelihood Estimation for Reinforcement Learning

16 November 2020

ArXiv (abs)PDF HTML

Papers citing "Reward Biased Maximum Likelihood Estimation for Reinforcement Learning"

12 / 12 papers shown

Title
Value-Incentivized Preference Optimization: A Unified Approach to Online and Offline RLHF Shicong Cen Jincheng Mei Katayoon Goshvadi Hanjun Dai Tong Yang Sherry Yang Dale Schuurmans Yuejie Chi Bo Dai OffRL 148 37 0 20 Feb 2025
Incentivize without Bonus: Provably Efficient Model-based Online Multi-agent RL for Markov Games Tong Yang Bo Dai Lin Xiao Yuejie Chi OffRL 135 2 0 13 Feb 2025
Provable Policy Gradient Methods for Average-Reward Markov Potential Games Min Cheng Ruida Zhou P. R. Kumar Chao Tian 98 5 0 09 Mar 2024
Value-Biased Maximum Likelihood Estimation for Model-based Reinforcement Learning in Discounted Linear MDPs Yu-Heng Hung Ping-Chun Hsieh Akshay Mete P. R. Kumar 41 0 0 17 Oct 2023
Maximize to Explore: One Objective Function Fusing Estimation, Planning, and Exploration Zhihan Liu Miao Lu Wei Xiong Han Zhong Haotian Hu Shenao Zhang Sirui Zheng Zhuoran Yang Zhaoran Wang OffRL 124 22 0 29 May 2023
When Is Partially Observable Reinforcement Learning Not Scary? Qinghua Liu Alan Chung Csaba Szepesvári Chi Jin 75 98 0 19 Apr 2022
Reward-Biased Maximum Likelihood Estimation for Neural Contextual Bandits Yu-Heng Hung Ping-Chun Hsieh 63 2 0 08 Mar 2022
Augmented RBMLE-UCB Approach for Adaptive Control of Linear Quadratic Systems Akshay Mete Rahul Singh P. R. Kumar 57 8 0 25 Jan 2022
Reinforcement Learning for Finite-Horizon Restless Multi-Armed Multi-Action Bandits Guojun Xiong Jian Li Rahul Singh 45 4 0 20 Sep 2021
Learning Augmented Index Policy for Optimal Service Placement at the Network Edge Guojun Xiong Rahul Singh Jian Li 91 9 0 10 Jan 2021
Whittle index based Q-learning for restless bandits with average reward Konstantin Avrachenkov Vivek Borkar 77 70 0 29 Apr 2020
Learning in Markov Decision Processes under Constraints Rahul Singh Abhishek Gupta Ness B. Shroff 135 28 0 27 Feb 2020