ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2012.08507
  4. Cited By
Nearly Minimax Optimal Reinforcement Learning for Linear Mixture Markov
  Decision Processes
v1v2 (latest)

Nearly Minimax Optimal Reinforcement Learning for Linear Mixture Markov Decision Processes

Annual Conference Computational Learning Theory (COLT), 2020
15 December 2020
Dongruo Zhou
Quanquan Gu
Csaba Szepesvári
ArXiv (abs)PDFHTML

Papers citing "Nearly Minimax Optimal Reinforcement Learning for Linear Mixture Markov Decision Processes"

50 / 168 papers shown
Distributionally Robust Online Markov Game with Linear Function Approximation
Distributionally Robust Online Markov Game with Linear Function Approximation
Zewu Zheng
Yuanyuan Lin
OODOffRL
356
0
0
11 Nov 2025
Vector-valued self-normalized concentration inequalities beyond sub-Gaussianity
Vector-valued self-normalized concentration inequalities beyond sub-Gaussianity
Diego Martinez-Taboada
Tomás González
Aaditya Ramdas
119
3
0
05 Nov 2025
Variance-Aware Feel-Good Thompson Sampling for Contextual Bandits
Variance-Aware Feel-Good Thompson Sampling for Contextual Bandits
Xuheng Li
Quanquan Gu
153
1
0
03 Nov 2025
Q-Learning with Shift-Aware Upper Confidence Bound in Non-Stationary Reinforcement Learning
Q-Learning with Shift-Aware Upper Confidence Bound in Non-Stationary Reinforcement Learning
H. Bui
Felix Parker
Kimia Ghobadi
Anqi Liu
OODOffRL
188
0
0
03 Oct 2025
Replicable Reinforcement Learning with Linear Function Approximation
Replicable Reinforcement Learning with Linear Function Approximation
Eric Eaton
Marcel Hussing
Michael Kearns
Aaron Roth
S. B. Sengupta
Jessica Sorrell
237
3
0
10 Sep 2025
Outcome-based Exploration for LLM Reasoning
Outcome-based Exploration for LLM Reasoning
Yuda Song
Julia Kempe
Remi Munos
OffRLLRM
321
49
0
08 Sep 2025
ORVIT: Near-Optimal Online Distributionally Robust Reinforcement Learning
ORVIT: Near-Optimal Online Distributionally Robust Reinforcement Learning
Debamita Ghosh
George Atia
Yue Wang
OffRLOOD
439
3
0
05 Aug 2025
Instance-Dependent Continuous-Time Reinforcement Learning via Maximum Likelihood Estimation
Instance-Dependent Continuous-Time Reinforcement Learning via Maximum Likelihood Estimation
Runze Zhao
Yue Yu
Ruhan Wang
Chunfeng Huang
Dongruo Zhou
268
0
0
04 Aug 2025
Generalized Kernelized Bandits: A Novel Self-Normalized Bernstein-Like Dimension-Free Inequality and Regret Bounds
Generalized Kernelized Bandits: A Novel Self-Normalized Bernstein-Like Dimension-Free Inequality and Regret Bounds
Alberto Maria Metelli
Simone Drago
Marco Mussi
192
2
0
03 Aug 2025
The Sample Complexity of Online Strategic Decision Making with Information Asymmetry and Knowledge Transportability
The Sample Complexity of Online Strategic Decision Making with Information Asymmetry and Knowledge Transportability
Jiachen Hu
Rui Ai
Han Zhong
Xiaoyu Chen
L. Wang
Zhaoran Wang
Zhuoran Yang
249
0
0
11 Jun 2025
Neural Variance-aware Dueling Bandits with Deep Representation and Shallow Exploration
Neural Variance-aware Dueling Bandits with Deep Representation and Shallow Exploration
Youngmin Oh
J. Park
Taejin Paik
Jaemin Park
271
1
0
02 Jun 2025
Linear Mixture Distributionally Robust Markov Decision Processes
Linear Mixture Distributionally Robust Markov Decision Processes
Zhishuai Liu
Pan Xu
366
6
0
23 May 2025
Provably Efficient Multi-Objective Bandit Algorithms under Preference-Centric Customization
Provably Efficient Multi-Objective Bandit Algorithms under Preference-Centric Customization
Linfeng Cao
Ming Shi
Ness B. Shroff
239
2
0
19 Feb 2025
Improved Regret Analysis in Gaussian Process Bandits: Optimality for Noiseless Reward, RKHS norm, and Non-Stationary Variance
Improved Regret Analysis in Gaussian Process Bandits: Optimality for Noiseless Reward, RKHS norm, and Non-Stationary Variance
S. Iwazaki
Shion Takeno
392
9
0
10 Feb 2025
Catoni Contextual Bandits are Robust to Heavy-tailed Rewards
Catoni Contextual Bandits are Robust to Heavy-tailed Rewards
Chenlu Ye
Yujia Jin
Alekh Agarwal
Tong Zhang
491
1
0
04 Feb 2025
Provably Efficient Reinforcement Learning with Multinomial Logit Function Approximation
Provably Efficient Reinforcement Learning with Multinomial Logit Function ApproximationNeural Information Processing Systems (NeurIPS), 2024
Long-Fei Li
Yu Zhang
Peng Zhao
Zhi Zhou
654
10
0
17 Jan 2025
Digital Twin Calibration with Model-Based Reinforcement Learning
Digital Twin Calibration with Model-Based Reinforcement Learning
Hua Zheng
Wei Xie
I. Ryzhov
Keilung Choy
433
0
0
04 Jan 2025
Variance-Aware Linear UCB with Deep Representation for Neural Contextual Bandits
Variance-Aware Linear UCB with Deep Representation for Neural Contextual BanditsInternational Conference on Artificial Intelligence and Statistics (AISTATS), 2024
H. Bui
Enrique Mallada
Anqi Liu
1.2K
4
0
08 Nov 2024
Near-Optimal Dynamic Regret for Adversarial Linear Mixture MDPs
Near-Optimal Dynamic Regret for Adversarial Linear Mixture MDPsNeural Information Processing Systems (NeurIPS), 2024
Long-Fei Li
Peng Zhao
Zhi Zhou
283
4
0
05 Nov 2024
Demystifying Linear MDPs and Novel Dynamics Aggregation Framework
Demystifying Linear MDPs and Novel Dynamics Aggregation FrameworkInternational Conference on Learning Representations (ICLR), 2024
Joongkyu Lee
Min-hwan Oh
339
5
0
31 Oct 2024
Learning Infinite-Horizon Average-Reward Linear Mixture MDPs of Bounded
  Span
Learning Infinite-Horizon Average-Reward Linear Mixture MDPs of Bounded SpanInternational Conference on Artificial Intelligence and Statistics (AISTATS), 2024
Woojin Chae
Kihyuk Hong
Yufan Zhang
Ambuj Tewari
Dabeen Lee
208
1
0
19 Oct 2024
Upper and Lower Bounds for Distributionally Robust Off-Dynamics
  Reinforcement Learning
Upper and Lower Bounds for Distributionally Robust Off-Dynamics Reinforcement Learning
Zhishuai Liu
Weixin Wang
Pan Xu
412
13
0
30 Sep 2024
Second Order Bounds for Contextual Bandits with Function Approximation
Second Order Bounds for Contextual Bandits with Function ApproximationInternational Conference on Learning Representations (ICLR), 2024
Aldo Pacchiano
689
9
0
24 Sep 2024
Hybrid Reinforcement Learning Breaks Sample Size Barriers in Linear MDPs
Hybrid Reinforcement Learning Breaks Sample Size Barriers in Linear MDPsNeural Information Processing Systems (NeurIPS), 2024
Kevin Tan
Wei Fan
Yuting Wei
OffRL
368
5
0
08 Aug 2024
Pessimism Meets Risk: Risk-Sensitive Offline Reinforcement Learning
Pessimism Meets Risk: Risk-Sensitive Offline Reinforcement Learning
Dake Zhang
Boxiang Lyu
Delin Qu
Mladen Kolar
Tong Zhang
OffRL
293
3
0
10 Jul 2024
Warm-up Free Policy Optimization: Improved Regret in Linear Markov
  Decision Processes
Warm-up Free Policy Optimization: Improved Regret in Linear Markov Decision Processes
Asaf B. Cassel
Aviv A. Rosenberg
366
5
0
03 Jul 2024
Uncertainty-Aware Reward-Free Exploration with General Function
  Approximation
Uncertainty-Aware Reward-Free Exploration with General Function Approximation
Junkai Zhang
Weitong Zhang
Dongruo Zhou
Q. Gu
493
6
0
24 Jun 2024
Imitation Learning in Discounted Linear MDPs without exploration
  assumptions
Imitation Learning in Discounted Linear MDPs without exploration assumptionsInternational Conference on Machine Learning (ICML), 2024
Luca Viano
Stratis Skoulakis
Volkan Cevher
355
9
0
03 May 2024
Sample-efficient Learning of Infinite-horizon Average-reward MDPs with
  General Function Approximation
Sample-efficient Learning of Infinite-horizon Average-reward MDPs with General Function Approximation
Jianliang He
Han Zhong
Zhuoran Yang
355
6
0
19 Apr 2024
Distributionally Robust Reinforcement Learning with Interactive Data
  Collection: Fundamental Hardness and Near-Optimal Algorithm
Distributionally Robust Reinforcement Learning with Interactive Data Collection: Fundamental Hardness and Near-Optimal AlgorithmNeural Information Processing Systems (NeurIPS), 2024
Miao Lu
Han Zhong
Tong Zhang
Jose H. Blanchet
OffRLOOD
284
22
0
04 Apr 2024
Sample Complexity of Offline Distributionally Robust Linear Markov
  Decision Processes
Sample Complexity of Offline Distributionally Robust Linear Markov Decision Processes
He Wang
Laixi Shi
Yuejie Chi
OffRL
494
14
0
19 Mar 2024
Prior-dependent analysis of posterior sampling reinforcement learning
  with function approximation
Prior-dependent analysis of posterior sampling reinforcement learning with function approximation
Yingru Li
Zhi-Quan Luo
232
0
0
17 Mar 2024
Horizon-Free Regret for Linear Markov Decision Processes
Horizon-Free Regret for Linear Markov Decision Processes
Zihan Zhang
Jason D. Lee
Yuxin Chen
Simon S. Du
254
4
0
15 Mar 2024
Variance-Dependent Regret Bounds for Non-stationary Linear Bandits
Variance-Dependent Regret Bounds for Non-stationary Linear Bandits
Zhiyong Wang
Jize Xie
Yi Chen
J. C. Lui
Dongruo Zhou
345
1
0
15 Mar 2024
Regret Minimization via Saddle Point Optimization
Regret Minimization via Saddle Point OptimizationNeural Information Processing Systems (NeurIPS), 2024
Johannes Kirschner
Seyed Alireza Bakhtiari
Kushagra Chandak
Volodymyr Tkachuk
Csaba Szepesvári
240
2
0
15 Mar 2024
A Natural Extension To Online Algorithms For Hybrid RL With Limited
  Coverage
A Natural Extension To Online Algorithms For Hybrid RL With Limited Coverage
Kevin Tan
Ziping Xu
OffRLOnRL
395
5
0
07 Mar 2024
Improved Algorithm for Adversarial Linear Mixture MDPs with Bandit
  Feedback and Unknown Transition
Improved Algorithm for Adversarial Linear Mixture MDPs with Bandit Feedback and Unknown TransitionInternational Conference on Artificial Intelligence and Statistics (AISTATS), 2024
Long-Fei Li
Peng Zhao
Zhi Zhou
368
7
0
07 Mar 2024
Provable Risk-Sensitive Distributional Reinforcement Learning with
  General Function Approximation
Provable Risk-Sensitive Distributional Reinforcement Learning with General Function Approximation
Yu Chen
Xiangcheng Zhang
Siwei Wang
Longbo Huang
413
3
0
28 Feb 2024
Nearly Minimax Optimal Regret for Learning Linear Mixture Stochastic
  Shortest Path
Nearly Minimax Optimal Regret for Learning Linear Mixture Stochastic Shortest Path
Qiwei Di
Jiafan He
Dongruo Zhou
Quanquan Gu
231
2
0
14 Feb 2024
Noise-Adaptive Confidence Sets for Linear Bandits and Application to
  Bayesian Optimization
Noise-Adaptive Confidence Sets for Linear Bandits and Application to Bayesian OptimizationInternational Conference on Machine Learning (ICML), 2024
Kwang-Sung Jun
Jungtaek Kim
317
4
0
12 Feb 2024
A Policy Gradient Primal-Dual Algorithm for Constrained MDPs with
  Uniform PAC Guarantees
A Policy Gradient Primal-Dual Algorithm for Constrained MDPs with Uniform PAC Guarantees
Toshinori Kitamura
Tadashi Kozuno
Masahiro Kato
Yuki Ichihara
Soichiro Nishimori
Akiyoshi Sannai
Sho Sonoda
Wataru Kumagai
Yutaka Matsuo
566
5
0
31 Jan 2024
Rethinking Model-based, Policy-based, and Value-based Reinforcement
  Learning via the Lens of Representation Complexity
Rethinking Model-based, Policy-based, and Value-based Reinforcement Learning via the Lens of Representation ComplexityNeural Information Processing Systems (NeurIPS), 2023
Guhao Feng
Han Zhong
OffRL
311
5
0
28 Dec 2023
Conservative Exploration for Policy Optimization via Off-Policy Policy
  Evaluation
Conservative Exploration for Policy Optimization via Off-Policy Policy Evaluation
Paul Daoudi
Mathias Formoso
Othman Gaizi
Achraf Azize
Evrard Garcelon
OffRL
233
0
0
24 Dec 2023
Horizon-Free and Instance-Dependent Regret Bounds for Reinforcement
  Learning with General Function Approximation
Horizon-Free and Instance-Dependent Regret Bounds for Reinforcement Learning with General Function Approximation
Jiayi Huang
Han Zhong
Liwei Wang
Lin F. Yang
217
4
0
07 Dec 2023
Learning Adversarial Low-rank Markov Decision Processes with Unknown
  Transition and Full-information Feedback
Learning Adversarial Low-rank Markov Decision Processes with Unknown Transition and Full-information FeedbackNeural Information Processing Systems (NeurIPS), 2023
Canzhe Zhao
Ruofeng Yang
Baoxiang Wang
Xuezhou Zhang
Shuai Li
294
4
0
14 Nov 2023
Federated Linear Bandits with Finite Adversarial Actions
Federated Linear Bandits with Finite Adversarial ActionsNeural Information Processing Systems (NeurIPS), 2023
Li Fan
Ruida Zhou
Chao Tian
Cong Shen
FedML
383
3
0
02 Nov 2023
Improved Bayesian Regret Bounds for Thompson Sampling in Reinforcement
  Learning
Improved Bayesian Regret Bounds for Thompson Sampling in Reinforcement LearningNeural Information Processing Systems (NeurIPS), 2023
Ahmadreza Moradipari
M. Pedramfar
Modjtaba Shokrian Zini
Vaneet Aggarwal
339
6
0
30 Oct 2023
Posterior Sampling with Delayed Feedback for Reinforcement Learning with
  Linear Function Approximation
Posterior Sampling with Delayed Feedback for Reinforcement Learning with Linear Function ApproximationNeural Information Processing Systems (NeurIPS), 2023
Nikki Lijing Kuang
Ming Yin
Mengdi Wang
Yu Wang
Yian Ma
364
7
0
29 Oct 2023
A Doubly Robust Approach to Sparse Reinforcement Learning
A Doubly Robust Approach to Sparse Reinforcement LearningInternational Conference on Artificial Intelligence and Statistics (AISTATS), 2023
Wonyoung Hedge Kim
Garud Iyengar
A. Zeevi
245
5
0
23 Oct 2023
Value-Biased Maximum Likelihood Estimation for Model-based Reinforcement
  Learning in Discounted Linear MDPs
Value-Biased Maximum Likelihood Estimation for Model-based Reinforcement Learning in Discounted Linear MDPs
Yu-Heng Hung
Ping-Chun Hsieh
Akshay Mete
P. R. Kumar
220
2
0
17 Oct 2023
1234
Next
Page 1 of 4