ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2011.07738
  4. Cited By
Reward Biased Maximum Likelihood Estimation for Reinforcement Learning
v1v2v3 (latest)

Reward Biased Maximum Likelihood Estimation for Reinforcement Learning

Conference on Learning for Dynamics & Control (L4DC), 2020
16 November 2020
Akshay Mete
Rahul Singh
Xi Liu
P. R. Kumar
ArXiv (abs)PDFHTML

Papers citing "Reward Biased Maximum Likelihood Estimation for Reinforcement Learning"

12 / 12 papers shown
Value-Incentivized Preference Optimization: A Unified Approach to Online and Offline RLHF
Value-Incentivized Preference Optimization: A Unified Approach to Online and Offline RLHFInternational Conference on Learning Representations (ICLR), 2024
Shicong Cen
Jincheng Mei
Katayoon Goshvadi
Hanjun Dai
Tong Yang
Sherry Yang
Dale Schuurmans
Yuejie Chi
Bo Dai
OffRL
768
65
0
20 Feb 2025
Incentivize without Bonus: Provably Efficient Model-based Online Multi-agent RL for Markov Games
Incentivize without Bonus: Provably Efficient Model-based Online Multi-agent RL for Markov Games
Tong Yang
Bo Dai
Lin Xiao
Yuejie Chi
OffRL
477
4
0
13 Feb 2025
Provable Policy Gradient Methods for Average-Reward Markov Potential
  Games
Provable Policy Gradient Methods for Average-Reward Markov Potential GamesInternational Conference on Artificial Intelligence and Statistics (AISTATS), 2024
Min Cheng
Ruida Zhou
P. R. Kumar
Chao Tian
334
8
0
09 Mar 2024
Value-Biased Maximum Likelihood Estimation for Model-based Reinforcement
  Learning in Discounted Linear MDPs
Value-Biased Maximum Likelihood Estimation for Model-based Reinforcement Learning in Discounted Linear MDPs
Yu-Heng Hung
Ping-Chun Hsieh
Akshay Mete
P. R. Kumar
220
2
0
17 Oct 2023
Maximize to Explore: One Objective Function Fusing Estimation, Planning,
  and Exploration
Maximize to Explore: One Objective Function Fusing Estimation, Planning, and ExplorationNeural Information Processing Systems (NeurIPS), 2023
Zhihan Liu
Miao Lu
Wei Xiong
Han Zhong
Haotian Hu
Shenao Zhang
Sirui Zheng
Zhuoran Yang
Zhaoran Wang
OffRL
395
27
0
29 May 2023
When Is Partially Observable Reinforcement Learning Not Scary?
When Is Partially Observable Reinforcement Learning Not Scary?Annual Conference Computational Learning Theory (COLT), 2022
Qinghua Liu
Alan Chung
Csaba Szepesvári
Chi Jin
280
125
0
19 Apr 2022
Reward-Biased Maximum Likelihood Estimation for Neural Contextual
  Bandits
Reward-Biased Maximum Likelihood Estimation for Neural Contextual BanditsAAAI Conference on Artificial Intelligence (AAAI), 2022
Yu-Heng Hung
Ping-Chun Hsieh
294
2
0
08 Mar 2022
Augmented RBMLE-UCB Approach for Adaptive Control of Linear Quadratic
  Systems
Augmented RBMLE-UCB Approach for Adaptive Control of Linear Quadratic SystemsNeural Information Processing Systems (NeurIPS), 2022
Akshay Mete
Rahul Singh
P. R. Kumar
168
10
0
25 Jan 2022
Reinforcement Learning for Finite-Horizon Restless Multi-Armed
  Multi-Action Bandits
Reinforcement Learning for Finite-Horizon Restless Multi-Armed Multi-Action Bandits
Efstathia Soufleri
Jian Li
Rahul Singh
277
4
0
20 Sep 2021
Learning Augmented Index Policy for Optimal Service Placement at the
  Network Edge
Learning Augmented Index Policy for Optimal Service Placement at the Network Edge
Efstathia Soufleri
Rahul Singh
Jian Li
305
10
0
10 Jan 2021
Whittle index based Q-learning for restless bandits with average reward
Whittle index based Q-learning for restless bandits with average reward
Konstantin Avrachenkov
Vivek Borkar
307
84
0
29 Apr 2020
Learning in Markov Decision Processes under Constraints
Learning in Markov Decision Processes under ConstraintsIEEE Transactions on Control of Network Systems (TCNS), 2020
Rahul Singh
Abhishek Gupta
Ness B. Shroff
458
32
0
27 Feb 2020
1
Page 1 of 1