Value-Incentivized Preference Optimization: A Unified Approach to Online and Offline RLHFInternational Conference on Learning Representations (ICLR), 2024 |
Provable Policy Gradient Methods for Average-Reward Markov Potential
GamesInternational Conference on Artificial Intelligence and Statistics (AISTATS), 2024 |
Maximize to Explore: One Objective Function Fusing Estimation, Planning,
and ExplorationNeural Information Processing Systems (NeurIPS), 2023 |
When Is Partially Observable Reinforcement Learning Not Scary?Annual Conference Computational Learning Theory (COLT), 2022 |
Reward-Biased Maximum Likelihood Estimation for Neural Contextual
BanditsAAAI Conference on Artificial Intelligence (AAAI), 2022 |
Augmented RBMLE-UCB Approach for Adaptive Control of Linear Quadratic
SystemsNeural Information Processing Systems (NeurIPS), 2022 |
Learning in Markov Decision Processes under ConstraintsIEEE Transactions on Control of Network Systems (TCNS), 2020 |