Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2112.10935
Cited By
Nearly Optimal Policy Optimization with Stable at Any Time Guarantee
21 December 2021
Tianhao Wu
Yunchang Yang
Han Zhong
Liwei Wang
S. Du
Jiantao Jiao
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Nearly Optimal Policy Optimization with Stable at Any Time Guarantee"
12 / 12 papers shown
Title
Warm-up Free Policy Optimization: Improved Regret in Linear Markov Decision Processes
Asaf B. Cassel
Aviv A. Rosenberg
35
1
0
03 Jul 2024
Combinatorial Multivariant Multi-Armed Bandits with Applications to Episodic Reinforcement Learning and Beyond
Xutong Liu
Siwei Wang
Jinhang Zuo
Han Zhong
Xuchuang Wang
Zhiyong Wang
Shuai Li
Mohammad Hajiesmaili
J. C. Lui
Wei Chen
85
1
0
03 Jun 2024
DPO Meets PPO: Reinforced Token Optimization for RLHF
Han Zhong
Guhao Feng
Guhao Feng
Li Zhao
Di He
Jiang Bian
Liwei Wang
Jiang Bian
Liwei Wang
55
56
0
29 Apr 2024
Distributionally Robust Reinforcement Learning with Interactive Data Collection: Fundamental Hardness and Near-Optimal Algorithm
Miao Lu
Han Zhong
Tong Zhang
Jose H. Blanchet
OffRL
OOD
71
4
0
04 Apr 2024
Rethinking Model-based, Policy-based, and Value-based Reinforcement Learning via the Lens of Representation Complexity
Guhao Feng
Han Zhong
OffRL
68
3
0
28 Dec 2023
Pairwise Proximal Policy Optimization: Harnessing Relative Feedback for LLM Alignment
Tianhao Wu
Banghua Zhu
Ruoyu Zhang
Zhaojin Wen
Kannan Ramchandran
Jiantao Jiao
31
54
0
30 Sep 2023
Optimistic Natural Policy Gradient: a Simple Efficient Policy Optimization Framework for Online RL
Qinghua Liu
Gellert Weisz
András Gyorgy
Chi Jin
Csaba Szepesvári
OffRL
16
8
0
18 May 2023
A Theoretical Analysis of Optimistic Proximal Policy Optimization in Linear Markov Decision Processes
Han Zhong
Tong Zhang
19
26
0
15 May 2023
Provably Efficient Exploration in Quantum Reinforcement Learning with Logarithmic Worst-Case Regret
Han Zhong
Jiachen Hu
Yecheng Xue
Tongyang Li
Liwei Wang
18
4
0
21 Feb 2023
Online Policy Optimization for Robust MDP
Jing Dong
Jingwei Li
Baoxiang Wang
J. Zhang
OffRL
16
12
0
28 Sep 2022
Policy Optimization for Stochastic Shortest Path
Liyu Chen
Haipeng Luo
Aviv A. Rosenberg
19
12
0
07 Feb 2022
UCB Momentum Q-learning: Correcting the bias without forgetting
Pierre Menard
O. D. Domingues
Xuedong Shang
Michal Valko
77
40
0
01 Mar 2021
1