All Papers

0 / 0 papers shown

Title

Title
Improved Training Mechanism for Reinforcement Learning via Online Model Selection Aida Afshar Aldo Pacchiano 40 0 0 01 Dec 2025
A Polynomial-time Algorithm for Online Sparse Linear Regression with Improved Regret Bound under Weaker ConditionsAnnual Conference Computational Learning Theory (COLT), 2025 Junfan Li Shizhong Liao Zenglin Xu L. Nie 80 0 0 31 Oct 2025
UCB-type Algorithm for Budget-Constrained Expert Learning Ilgam Latypov A. Suvorikova Alexey Kroshnin Alexander Gasnikov Yuriy Dorn 80 0 0 26 Oct 2025
Data-Dependent Regret Bounds for Constrained MABs Gianmarco Genalti Francesco Emanuele Stradi Matteo Castiglioni A. Marchesi N. Gatti 347 0 0 26 May 2025
Sparse Nonparametric Contextual Bandits Hamish Flynn Julia Olkhovskaya Paul Rognon-Vael 323 0 0 20 Mar 2025
Offline-to-online hyperparameter transfer for stochastic banditsAAAI Conference on Artificial Intelligence (AAAI), 2025 Dravyansh Sharma Arun Sai Suggala OffRL 279 8 0 06 Jan 2025
A Model Selection Approach for Corruption Robust Reinforcement LearningInternational Conference on Algorithmic Learning Theory (ALT), 2021 Chen-Yu Wei Christoph Dann Julian Zimmert 281 48 0 31 Dec 2024
Adaptive Resource Allocation for Virtualized Base Stations in O-RAN with Online LearningIEEE Transactions on Communications (IEEE Trans. Commun.), 2023 Michail Kalntis Georgios Iosifidis Fernando A. Kuipers 149 10 0 31 Dec 2024
Model Selection for Average Reward RL with Application to Utility Maximization in Repeated Games Alireza Masoumian James R. Wright 405 2 0 09 Nov 2024
Bayesian Optimisation with Unknown Hyperparameters: Regret Bounds Logarithmically Closer to OptimalNeural Information Processing Systems (NeurIPS), 2024 Juliusz Ziomek Masaki Adachi Michael A. Osborne 407 4 0 14 Oct 2024
Stochastic Bandits Robust to Adversarial Attacks Xuchuang Wang Jinhang Zuo Xutong Liu John C. S. Lui Mohammad Hajiesmaili AAML 133 0 0 16 Aug 2024
Learning Rate-Free Reinforcement Learning: A Case for Model Selection with Non-Stationary Objectives Aida Afshar Aldo Pacchiano 189 0 0 07 Aug 2024
Causal Bandits: The Pareto Optimal Frontier of Adaptivity, a Reduction to Linear Bandits, and Limitations around Unknown Marginals Ziyi Liu Idan Attias Daniel M. Roy CML 183 2 0 01 Jul 2024
Efficient Sequential Decision Making with Large Language Models Dingyang Chen Qi Zhang Yinglun Zhu LRM 400 9 0 17 Jun 2024
No-Regret Learning for Fair Multi-Agent Social Welfare Optimization Mengxiao Zhang Ramiro Deo-Campo Vuong Haipeng Luo 184 4 0 31 May 2024
Symmetric Linear Bandits with Hidden Symmetry Nam-Phuong Tran T. Ta Debmalya Mandal Long Tran-Thanh 310 1 0 22 May 2024
Incentive-compatible Bandits: Importance Weighting No More Julian Zimmert T. V. Marinov 175 0 0 10 May 2024
Online Bandits with (Biased) Offline Data: Adaptive Learning under Distribution MismatchInternational Conference on Machine Learning (ICML), 2024 Wang Chi Cheung Lixing Lyu OffRL 385 12 0 04 May 2024
The SMART approach to instance-optimal online learning Siddhartha Banerjee Alankrita Bhatt Chao Yu 173 0 0 27 Feb 2024
Model Assessment and Selection under Temporal Distribution Shift Elise Han Chengpiao Huang Kaizheng Wang OOD 282 6 0 13 Feb 2024
Experiment Planning with Function ApproximationNeural Information Processing Systems (NeurIPS), 2024 Aldo Pacchiano Jonathan Lee Emma Brunskill OffRL 183 4 0 10 Jan 2024
Best-of-Both-Worlds Algorithms for Linear Contextual Bandits Yuko Kuroki Alberto Rumi Taira Tsuchiya Fabio Vitale Nicolò Cesa-Bianchi 271 11 0 24 Dec 2023
An Improved Relaxation for Oracle-Efficient Adversarial Contextual BanditsNeural Information Processing Systems (NeurIPS), 2023 Kiarash Banihashem Mohammadtaghi Hajiaghayi Suho Shin Max Springer 258 1 0 29 Oct 2023
Bypassing the Simulator: Near-Optimal Adversarial Linear Contextual BanditsNeural Information Processing Systems (NeurIPS), 2023 Haolin Liu Chen-Yu Wei Julian Zimmert 236 11 0 02 Sep 2023
Anytime Model Selection in Linear BanditsNeural Information Processing Systems (NeurIPS), 2023 Parnian Kassraie N. Emmenegger Andreas Krause Aldo Pacchiano 289 7 0 24 Jul 2023
Data-Driven Online Model Selection With Regret GuaranteesInternational Conference on Artificial Intelligence and Statistics (AISTATS), 2023 Aldo Pacchiano Christoph Dann Claudio Gentile OffRL 321 9 0 05 Jun 2023
Adaptation to Misspecified Kernel Regularity in Kernelised BanditsInternational Conference on Artificial Intelligence and Statistics (AISTATS), 2023 Yusha Liu Aarti Singh 247 3 0 26 Apr 2023
Improved Regret Bounds for Online Kernel Selection under Bandit Feedback Junfan Li Shizhong Liao 110 1 0 09 Mar 2023
A Blackbox Approach to Best of Both Worlds in Bandits and BeyondAnnual Conference Computational Learning Theory (COLT), 2023 Christoph Dann Chen-Yu Wei Julian Zimmert 213 28 0 20 Feb 2023
Estimating Optimal Policy Value in General Linear Contextual Bandits Jonathan Lee Weihao Kong Aldo Pacchiano Vidya Muthukumar Emma Brunskill 174 0 0 19 Feb 2023
Online Continuous Hyperparameter Optimization for Generalized Linear Contextual Bandits Yue Kang Cho-Jui Hsieh T. C. Lee 240 2 0 18 Feb 2023
Leveraging User-Triggered Supervision in Contextual Bandits Alekh Agarwal Claudio Gentile T. V. Marinov 157 0 0 07 Feb 2023
Banker Online Mirror Descent: A Universal Approach for Delayed Online Bandit LearningInternational Conference on Machine Learning (ICML), 2023 Jiatai Huang Yan Dai Longbo Huang 260 7 0 25 Jan 2023
Stochastic Rising BanditsInternational Conference on Machine Learning (ICML), 2022 Alberto Maria Metelli F. Trovò Matteo Pirola Marcello Restelli 152 18 0 07 Dec 2022
Oracle Inequalities for Model Selection in Offline Reinforcement LearningNeural Information Processing Systems (NeurIPS), 2022 Jonathan Lee George Tucker Ofir Nachum Bo Dai Emma Brunskill OffRL 322 14 0 03 Nov 2022
Lifelong Bandit Optimization: No Prior and No RegretConference on Uncertainty in Artificial Intelligence (UAI), 2022 Felix Schur Parnian Kassraie Jonas Rothfuss Andreas Krause 282 3 0 27 Oct 2022
One Arrow, Two Kills: An Unified Framework for Achieving Optimal Regret Guarantees in Sleeping BanditsInternational Conference on Artificial Intelligence and Statistics (AISTATS), 2022 Pierre Gaillard Aadirupa Saha Soham Dan 183 3 0 26 Oct 2022
Eigen Memory Trees Mark Rucker Jordan T. Ash John Langford Paul Mineiro Ida Momennejad 163 0 0 25 Oct 2022
Conditionally Risk-Averse Contextual Bandits Mónika Farsang Paul Mineiro Wangda Zhang 208 2 0 24 Oct 2022
Scalable Representation Learning in Linear Contextual Bandits with Constant Regret GuaranteesNeural Information Processing Systems (NeurIPS), 2022 Andrea Tirinzoni Matteo Papini Ahmed Touati A. Lazaric Matteo Pirotta 234 6 0 24 Oct 2022
Unpacking Reward Shaping: Understanding the Benefits of Reward Engineering on Sample ComplexityNeural Information Processing Systems (NeurIPS), 2022 Abhishek Gupta Aldo Pacchiano Yuexiang Zhai Sham Kakade Sergey Levine OffRL 193 93 0 18 Oct 2022
Neural Design for Genetic Perturbation ExperimentsInternational Conference on Learning Representations (ICLR), 2022 Aldo Pacchiano Drausin Wulsin Robert A. Barton L. Voloch 218 7 0 26 Jul 2022
Exploration in Linear Bandits with Rich Action Sets and its Implications for InferenceInternational Conference on Artificial Intelligence and Statistics (AISTATS), 2022 Debangshu Banerjee Avishek Ghosh Sayak Ray Chowdhury Aditya Gopalan 247 10 0 23 Jul 2022
Contextual Bandits with Smooth Regret: Efficient Learning in Continuous Action SpacesInternational Conference on Machine Learning (ICML), 2022 Yinglun Zhu Paul Mineiro 190 18 0 12 Jul 2022
Model Selection in Reinforcement Learning with General Function Approximations Avishek Ghosh Sayak Ray Chowdhury 114 3 0 06 Jul 2022
Best of Both Worlds Model SelectionNeural Information Processing Systems (NeurIPS), 2022 Aldo Pacchiano Christoph Dann Claudio Gentile 192 11 0 29 Jun 2022
Adversarial Bandits against Arbitrary Strategies Jung-hun Kim Se-Young Yun 357 0 0 30 May 2022
$Breaking the $\sqrt{T}$ Barrier: Instance-Independent Logarithmic Regret in Stochastic Contextual Linear Bandits$ Breaking the $\sqrt{T}$ Barrier: Instance-Independent Logarithmic Regret in Stochastic Contextual Linear BanditsInternational Conference on Machine Learning (ICML), 2022 Avishek Ghosh Abishek Sankararaman 167 5 0 19 May 2022
Nearly Optimal Algorithms for Linear Contextual Bandits with Adversarial CorruptionsNeural Information Processing Systems (NeurIPS), 2022 Jiafan He Dongruo Zhou Tong Zhang Quanquan Gu 233 53 0 13 May 2022
Leveraging Initial Hints for Free in Stochastic Linear BanditsInternational Conference on Algorithmic Learning Theory (ALT), 2022 Ashok Cutkosky Christoph Dann Abhimanyu Das Qiuyi Qiuyi Zhang 139 4 0 08 Mar 2022

Title

Improved Training Mechanism for Reinforcement Learning via Online Model Selection

Aida Afshar

Aldo Pacchiano

01 Dec 2025

A Polynomial-time Algorithm for Online Sparse Linear Regression with Improved Regret Bound under Weaker ConditionsAnnual Conference Computational Learning Theory (COLT), 2025

31 Oct 2025

UCB-type Algorithm for Budget-Constrained Expert Learning

26 Oct 2025

Data-Dependent Regret Bounds for Constrained MABs

Gianmarco Genalti

Francesco Emanuele Stradi

Matteo Castiglioni

A. Marchesi

N. Gatti

347

26 May 2025

Sparse Nonparametric Contextual Bandits

Hamish Flynn

Julia Olkhovskaya

Paul Rognon-Vael

323

20 Mar 2025

Offline-to-online hyperparameter transfer for stochastic banditsAAAI Conference on Artificial Intelligence (AAAI), 2025

Dravyansh Sharma

Arun Sai Suggala

OffRL

279

06 Jan 2025

A Model Selection Approach for Corruption Robust Reinforcement LearningInternational Conference on Algorithmic Learning Theory (ALT), 2021

Chen-Yu Wei

Christoph Dann

Julian Zimmert

281

31 Dec 2024

Adaptive Resource Allocation for Virtualized Base Stations in O-RAN with Online LearningIEEE Transactions on Communications (IEEE Trans. Commun.), 2023

Michail Kalntis

Georgios Iosifidis

Fernando A. Kuipers

149

31 Dec 2024

Model Selection for Average Reward RL with Application to Utility Maximization in Repeated Games

Alireza Masoumian

James R. Wright

405

09 Nov 2024

Bayesian Optimisation with Unknown Hyperparameters: Regret Bounds Logarithmically Closer to OptimalNeural Information Processing Systems (NeurIPS), 2024

Juliusz Ziomek

Masaki Adachi

Michael A. Osborne

407

14 Oct 2024

Stochastic Bandits Robust to Adversarial Attacks

John C. S. Lui

133

16 Aug 2024

Learning Rate-Free Reinforcement Learning: A Case for Model Selection with Non-Stationary Objectives

Aida Afshar

Aldo Pacchiano

189

07 Aug 2024

Causal Bandits: The Pareto Optimal Frontier of Adaptivity, a Reduction to Linear Bandits, and Limitations around Unknown Marginals

183

01 Jul 2024

Efficient Sequential Decision Making with Large Language Models

400

17 Jun 2024

No-Regret Learning for Fair Multi-Agent Social Welfare Optimization

Mengxiao Zhang

Ramiro Deo-Campo Vuong

Haipeng Luo

184

31 May 2024

Symmetric Linear Bandits with Hidden Symmetry

310

22 May 2024

Incentive-compatible Bandits: Importance Weighting No More

Julian Zimmert

T. V. Marinov

175

10 May 2024

Online Bandits with (Biased) Offline Data: Adaptive Learning under Distribution MismatchInternational Conference on Machine Learning (ICML), 2024

Wang Chi Cheung

Lixing Lyu

OffRL

385

04 May 2024

The SMART approach to instance-optimal online learning

Siddhartha Banerjee

Alankrita Bhatt

Chao Yu

173

27 Feb 2024

Model Assessment and Selection under Temporal Distribution Shift

282

13 Feb 2024

Experiment Planning with Function ApproximationNeural Information Processing Systems (NeurIPS), 2024

183

10 Jan 2024

Best-of-Both-Worlds Algorithms for Linear Contextual Bandits

Fabio Vitale

271

24 Dec 2023

An Improved Relaxation for Oracle-Efficient Adversarial Contextual BanditsNeural Information Processing Systems (NeurIPS), 2023

Kiarash Banihashem

Mohammadtaghi Hajiaghayi

Suho Shin

Max Springer

258

29 Oct 2023

Bypassing the Simulator: Near-Optimal Adversarial Linear Contextual BanditsNeural Information Processing Systems (NeurIPS), 2023

Haolin Liu

Chen-Yu Wei

Julian Zimmert

236

02 Sep 2023

Anytime Model Selection in Linear BanditsNeural Information Processing Systems (NeurIPS), 2023

289

24 Jul 2023

Data-Driven Online Model Selection With Regret GuaranteesInternational Conference on Artificial Intelligence and Statistics (AISTATS), 2023

321

05 Jun 2023

Adaptation to Misspecified Kernel Regularity in Kernelised BanditsInternational Conference on Artificial Intelligence and Statistics (AISTATS), 2023

Yusha Liu

Aarti Singh

247

26 Apr 2023

Improved Regret Bounds for Online Kernel Selection under Bandit Feedback

Junfan Li

Shizhong Liao

110

09 Mar 2023

A Blackbox Approach to Best of Both Worlds in Bandits and BeyondAnnual Conference Computational Learning Theory (COLT), 2023

Christoph Dann

Chen-Yu Wei

Julian Zimmert

213

20 Feb 2023

Estimating Optimal Policy Value in General Linear Contextual Bandits

174

19 Feb 2023

Online Continuous Hyperparameter Optimization for Generalized Linear Contextual Bandits

Yue Kang

Cho-Jui Hsieh

T. C. Lee

240

18 Feb 2023

Leveraging User-Triggered Supervision in Contextual Bandits

Alekh Agarwal

Claudio Gentile

T. V. Marinov

157

07 Feb 2023

Banker Online Mirror Descent: A Universal Approach for Delayed Online Bandit LearningInternational Conference on Machine Learning (ICML), 2023

Jiatai Huang

Yan Dai

Longbo Huang

260

25 Jan 2023

Stochastic Rising BanditsInternational Conference on Machine Learning (ICML), 2022

Alberto Maria Metelli

F. Trovò

Matteo Pirola

Marcello Restelli

152

07 Dec 2022

Oracle Inequalities for Model Selection in Offline Reinforcement LearningNeural Information Processing Systems (NeurIPS), 2022

322

03 Nov 2022

Lifelong Bandit Optimization: No Prior and No RegretConference on Uncertainty in Artificial Intelligence (UAI), 2022

282

27 Oct 2022

One Arrow, Two Kills: An Unified Framework for Achieving Optimal Regret Guarantees in Sleeping BanditsInternational Conference on Artificial Intelligence and Statistics (AISTATS), 2022

Pierre Gaillard

Aadirupa Saha

Soham Dan

183

26 Oct 2022

163

25 Oct 2022

Conditionally Risk-Averse Contextual Bandits

Mónika Farsang

Paul Mineiro

Wangda Zhang

208

24 Oct 2022

Scalable Representation Learning in Linear Contextual Bandits with Constant Regret GuaranteesNeural Information Processing Systems (NeurIPS), 2022

234

24 Oct 2022

Unpacking Reward Shaping: Understanding the Benefits of Reward Engineering on Sample ComplexityNeural Information Processing Systems (NeurIPS), 2022

Abhishek Gupta

193

18 Oct 2022

Neural Design for Genetic Perturbation ExperimentsInternational Conference on Learning Representations (ICLR), 2022

218

26 Jul 2022

Exploration in Linear Bandits with Rich Action Sets and its Implications for InferenceInternational Conference on Artificial Intelligence and Statistics (AISTATS), 2022

247

23 Jul 2022

Contextual Bandits with Smooth Regret: Efficient Learning in Continuous Action SpacesInternational Conference on Machine Learning (ICML), 2022

Yinglun Zhu

Paul Mineiro

190

12 Jul 2022

Model Selection in Reinforcement Learning with General Function Approximations

Avishek Ghosh

Sayak Ray Chowdhury

114

06 Jul 2022

Best of Both Worlds Model SelectionNeural Information Processing Systems (NeurIPS), 2022

Aldo Pacchiano

Christoph Dann

Claudio Gentile

192

29 Jun 2022

Adversarial Bandits against Arbitrary Strategies

Jung-hun Kim

Se-Young Yun

357

30 May 2022

$Breaking the $\sqrt{T}$ Barrier: Instance-Independent Logarithmic Regret in Stochastic Contextual Linear Bandits$

Breaking the

\sqrt{T}

Barrier: Instance-Independent Logarithmic Regret in Stochastic Contextual Linear BanditsInternational Conference on Machine Learning (ICML), 2022

Avishek Ghosh

Abishek Sankararaman

167

19 May 2022

Nearly Optimal Algorithms for Linear Contextual Bandits with Adversarial CorruptionsNeural Information Processing Systems (NeurIPS), 2022

Jiafan He

Dongruo Zhou

Tong Zhang

Quanquan Gu

233

13 May 2022

Leveraging Initial Hints for Free in Stochastic Linear BanditsInternational Conference on Algorithmic Learning Theory (ALT), 2022

139

08 Mar 2022

Title
Improved Training Mechanism for Reinforcement Learning via Online Model Selection Aida Afshar Aldo Pacchiano 40 0 0 01 Dec 2025
A Polynomial-time Algorithm for Online Sparse Linear Regression with Improved Regret Bound under Weaker ConditionsAnnual Conference Computational Learning Theory (COLT), 2025 Junfan Li Shizhong Liao Zenglin Xu L. Nie 80 0 0 31 Oct 2025
UCB-type Algorithm for Budget-Constrained Expert Learning Ilgam Latypov A. Suvorikova Alexey Kroshnin Alexander Gasnikov Yuriy Dorn 80 0 0 26 Oct 2025
Data-Dependent Regret Bounds for Constrained MABs Gianmarco Genalti Francesco Emanuele Stradi Matteo Castiglioni A. Marchesi N. Gatti 347 0 0 26 May 2025
Sparse Nonparametric Contextual Bandits Hamish Flynn Julia Olkhovskaya Paul Rognon-Vael 323 0 0 20 Mar 2025
Offline-to-online hyperparameter transfer for stochastic banditsAAAI Conference on Artificial Intelligence (AAAI), 2025 Dravyansh Sharma Arun Sai Suggala OffRL 279 8 0 06 Jan 2025
A Model Selection Approach for Corruption Robust Reinforcement LearningInternational Conference on Algorithmic Learning Theory (ALT), 2021 Chen-Yu Wei Christoph Dann Julian Zimmert 281 48 0 31 Dec 2024
Adaptive Resource Allocation for Virtualized Base Stations in O-RAN with Online LearningIEEE Transactions on Communications (IEEE Trans. Commun.), 2023 Michail Kalntis Georgios Iosifidis Fernando A. Kuipers 149 10 0 31 Dec 2024
Model Selection for Average Reward RL with Application to Utility Maximization in Repeated Games Alireza Masoumian James R. Wright 405 2 0 09 Nov 2024
Bayesian Optimisation with Unknown Hyperparameters: Regret Bounds Logarithmically Closer to OptimalNeural Information Processing Systems (NeurIPS), 2024 Juliusz Ziomek Masaki Adachi Michael A. Osborne 407 4 0 14 Oct 2024
Stochastic Bandits Robust to Adversarial Attacks Xuchuang Wang Jinhang Zuo Xutong Liu John C. S. Lui Mohammad Hajiesmaili AAML 133 0 0 16 Aug 2024
Learning Rate-Free Reinforcement Learning: A Case for Model Selection with Non-Stationary Objectives Aida Afshar Aldo Pacchiano 189 0 0 07 Aug 2024
Causal Bandits: The Pareto Optimal Frontier of Adaptivity, a Reduction to Linear Bandits, and Limitations around Unknown Marginals Ziyi Liu Idan Attias Daniel M. Roy CML 183 2 0 01 Jul 2024
Efficient Sequential Decision Making with Large Language Models Dingyang Chen Qi Zhang Yinglun Zhu LRM 400 9 0 17 Jun 2024
No-Regret Learning for Fair Multi-Agent Social Welfare Optimization Mengxiao Zhang Ramiro Deo-Campo Vuong Haipeng Luo 184 4 0 31 May 2024
Symmetric Linear Bandits with Hidden Symmetry Nam-Phuong Tran T. Ta Debmalya Mandal Long Tran-Thanh 310 1 0 22 May 2024
Incentive-compatible Bandits: Importance Weighting No More Julian Zimmert T. V. Marinov 175 0 0 10 May 2024
Online Bandits with (Biased) Offline Data: Adaptive Learning under Distribution MismatchInternational Conference on Machine Learning (ICML), 2024 Wang Chi Cheung Lixing Lyu OffRL 385 12 0 04 May 2024
The SMART approach to instance-optimal online learning Siddhartha Banerjee Alankrita Bhatt Chao Yu 173 0 0 27 Feb 2024
Model Assessment and Selection under Temporal Distribution Shift Elise Han Chengpiao Huang Kaizheng Wang OOD 282 6 0 13 Feb 2024
Experiment Planning with Function ApproximationNeural Information Processing Systems (NeurIPS), 2024 Aldo Pacchiano Jonathan Lee Emma Brunskill OffRL 183 4 0 10 Jan 2024
Best-of-Both-Worlds Algorithms for Linear Contextual Bandits Yuko Kuroki Alberto Rumi Taira Tsuchiya Fabio Vitale Nicolò Cesa-Bianchi 271 11 0 24 Dec 2023
An Improved Relaxation for Oracle-Efficient Adversarial Contextual BanditsNeural Information Processing Systems (NeurIPS), 2023 Kiarash Banihashem Mohammadtaghi Hajiaghayi Suho Shin Max Springer 258 1 0 29 Oct 2023
Bypassing the Simulator: Near-Optimal Adversarial Linear Contextual BanditsNeural Information Processing Systems (NeurIPS), 2023 Haolin Liu Chen-Yu Wei Julian Zimmert 236 11 0 02 Sep 2023
Anytime Model Selection in Linear BanditsNeural Information Processing Systems (NeurIPS), 2023 Parnian Kassraie N. Emmenegger Andreas Krause Aldo Pacchiano 289 7 0 24 Jul 2023
Data-Driven Online Model Selection With Regret GuaranteesInternational Conference on Artificial Intelligence and Statistics (AISTATS), 2023 Aldo Pacchiano Christoph Dann Claudio Gentile OffRL 321 9 0 05 Jun 2023
Adaptation to Misspecified Kernel Regularity in Kernelised BanditsInternational Conference on Artificial Intelligence and Statistics (AISTATS), 2023 Yusha Liu Aarti Singh 247 3 0 26 Apr 2023
Improved Regret Bounds for Online Kernel Selection under Bandit Feedback Junfan Li Shizhong Liao 110 1 0 09 Mar 2023
A Blackbox Approach to Best of Both Worlds in Bandits and BeyondAnnual Conference Computational Learning Theory (COLT), 2023 Christoph Dann Chen-Yu Wei Julian Zimmert 213 28 0 20 Feb 2023
Estimating Optimal Policy Value in General Linear Contextual Bandits Jonathan Lee Weihao Kong Aldo Pacchiano Vidya Muthukumar Emma Brunskill 174 0 0 19 Feb 2023
Online Continuous Hyperparameter Optimization for Generalized Linear Contextual Bandits Yue Kang Cho-Jui Hsieh T. C. Lee 240 2 0 18 Feb 2023
Leveraging User-Triggered Supervision in Contextual Bandits Alekh Agarwal Claudio Gentile T. V. Marinov 157 0 0 07 Feb 2023
Banker Online Mirror Descent: A Universal Approach for Delayed Online Bandit LearningInternational Conference on Machine Learning (ICML), 2023 Jiatai Huang Yan Dai Longbo Huang 260 7 0 25 Jan 2023
Stochastic Rising BanditsInternational Conference on Machine Learning (ICML), 2022 Alberto Maria Metelli F. Trovò Matteo Pirola Marcello Restelli 152 18 0 07 Dec 2022
Oracle Inequalities for Model Selection in Offline Reinforcement LearningNeural Information Processing Systems (NeurIPS), 2022 Jonathan Lee George Tucker Ofir Nachum Bo Dai Emma Brunskill OffRL 322 14 0 03 Nov 2022
Lifelong Bandit Optimization: No Prior and No RegretConference on Uncertainty in Artificial Intelligence (UAI), 2022 Felix Schur Parnian Kassraie Jonas Rothfuss Andreas Krause 282 3 0 27 Oct 2022
One Arrow, Two Kills: An Unified Framework for Achieving Optimal Regret Guarantees in Sleeping BanditsInternational Conference on Artificial Intelligence and Statistics (AISTATS), 2022 Pierre Gaillard Aadirupa Saha Soham Dan 183 3 0 26 Oct 2022
Eigen Memory Trees Mark Rucker Jordan T. Ash John Langford Paul Mineiro Ida Momennejad 163 0 0 25 Oct 2022
Conditionally Risk-Averse Contextual Bandits Mónika Farsang Paul Mineiro Wangda Zhang 208 2 0 24 Oct 2022
Scalable Representation Learning in Linear Contextual Bandits with Constant Regret GuaranteesNeural Information Processing Systems (NeurIPS), 2022 Andrea Tirinzoni Matteo Papini Ahmed Touati A. Lazaric Matteo Pirotta 234 6 0 24 Oct 2022
Unpacking Reward Shaping: Understanding the Benefits of Reward Engineering on Sample ComplexityNeural Information Processing Systems (NeurIPS), 2022 Abhishek Gupta Aldo Pacchiano Yuexiang Zhai Sham Kakade Sergey Levine OffRL 193 93 0 18 Oct 2022
Neural Design for Genetic Perturbation ExperimentsInternational Conference on Learning Representations (ICLR), 2022 Aldo Pacchiano Drausin Wulsin Robert A. Barton L. Voloch 218 7 0 26 Jul 2022
Exploration in Linear Bandits with Rich Action Sets and its Implications for InferenceInternational Conference on Artificial Intelligence and Statistics (AISTATS), 2022 Debangshu Banerjee Avishek Ghosh Sayak Ray Chowdhury Aditya Gopalan 247 10 0 23 Jul 2022
Contextual Bandits with Smooth Regret: Efficient Learning in Continuous Action SpacesInternational Conference on Machine Learning (ICML), 2022 Yinglun Zhu Paul Mineiro 190 18 0 12 Jul 2022
Model Selection in Reinforcement Learning with General Function Approximations Avishek Ghosh Sayak Ray Chowdhury 114 3 0 06 Jul 2022
Best of Both Worlds Model SelectionNeural Information Processing Systems (NeurIPS), 2022 Aldo Pacchiano Christoph Dann Claudio Gentile 192 11 0 29 Jun 2022
Adversarial Bandits against Arbitrary Strategies Jung-hun Kim Se-Young Yun 357 0 0 30 May 2022
$Breaking the $\sqrt{T}$ Barrier: Instance-Independent Logarithmic Regret in Stochastic Contextual Linear Bandits$ Breaking the $\sqrt{T}$ Barrier: Instance-Independent Logarithmic Regret in Stochastic Contextual Linear BanditsInternational Conference on Machine Learning (ICML), 2022 Avishek Ghosh Abishek Sankararaman 167 5 0 19 May 2022
Nearly Optimal Algorithms for Linear Contextual Bandits with Adversarial CorruptionsNeural Information Processing Systems (NeurIPS), 2022 Jiafan He Dongruo Zhou Tong Zhang Quanquan Gu 233 53 0 13 May 2022
Leveraging Initial Hints for Free in Stochastic Linear BanditsInternational Conference on Algorithmic Learning Theory (ALT), 2022 Ashok Cutkosky Christoph Dann Abhimanyu Das Qiuyi Qiuyi Zhang 139 4 0 08 Mar 2022

Title

Improved Training Mechanism for Reinforcement Learning via Online Model Selection

Aida Afshar

Aldo Pacchiano

01 Dec 2025

A Polynomial-time Algorithm for Online Sparse Linear Regression with Improved Regret Bound under Weaker ConditionsAnnual Conference Computational Learning Theory (COLT), 2025

31 Oct 2025

UCB-type Algorithm for Budget-Constrained Expert Learning