Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2011.07738
Cited By
v1
v2
v3 (latest)
Reward Biased Maximum Likelihood Estimation for Reinforcement Learning
16 November 2020
Akshay Mete
Rahul Singh
Xi Liu
P. R. Kumar
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Reward Biased Maximum Likelihood Estimation for Reinforcement Learning"
12 / 12 papers shown
Title
Value-Incentivized Preference Optimization: A Unified Approach to Online and Offline RLHF
Shicong Cen
Jincheng Mei
Katayoon Goshvadi
Hanjun Dai
Tong Yang
Sherry Yang
Dale Schuurmans
Yuejie Chi
Bo Dai
OffRL
148
37
0
20 Feb 2025
Incentivize without Bonus: Provably Efficient Model-based Online Multi-agent RL for Markov Games
Tong Yang
Bo Dai
Lin Xiao
Yuejie Chi
OffRL
135
2
0
13 Feb 2025
Provable Policy Gradient Methods for Average-Reward Markov Potential Games
Min Cheng
Ruida Zhou
P. R. Kumar
Chao Tian
98
5
0
09 Mar 2024
Value-Biased Maximum Likelihood Estimation for Model-based Reinforcement Learning in Discounted Linear MDPs
Yu-Heng Hung
Ping-Chun Hsieh
Akshay Mete
P. R. Kumar
41
0
0
17 Oct 2023
Maximize to Explore: One Objective Function Fusing Estimation, Planning, and Exploration
Zhihan Liu
Miao Lu
Wei Xiong
Han Zhong
Haotian Hu
Shenao Zhang
Sirui Zheng
Zhuoran Yang
Zhaoran Wang
OffRL
124
22
0
29 May 2023
When Is Partially Observable Reinforcement Learning Not Scary?
Qinghua Liu
Alan Chung
Csaba Szepesvári
Chi Jin
75
98
0
19 Apr 2022
Reward-Biased Maximum Likelihood Estimation for Neural Contextual Bandits
Yu-Heng Hung
Ping-Chun Hsieh
63
2
0
08 Mar 2022
Augmented RBMLE-UCB Approach for Adaptive Control of Linear Quadratic Systems
Akshay Mete
Rahul Singh
P. R. Kumar
57
8
0
25 Jan 2022
Reinforcement Learning for Finite-Horizon Restless Multi-Armed Multi-Action Bandits
Guojun Xiong
Jian Li
Rahul Singh
45
4
0
20 Sep 2021
Learning Augmented Index Policy for Optimal Service Placement at the Network Edge
Guojun Xiong
Rahul Singh
Jian Li
91
9
0
10 Jan 2021
Whittle index based Q-learning for restless bandits with average reward
Konstantin Avrachenkov
Vivek Borkar
77
70
0
29 Apr 2020
Learning in Markov Decision Processes under Constraints
Rahul Singh
Abhishek Gupta
Ness B. Shroff
135
28
0
27 Feb 2020
1