ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2110.00871
  4. Cited By
Feel-Good Thompson Sampling for Contextual Bandits and Reinforcement
  Learning

Feel-Good Thompson Sampling for Contextual Bandits and Reinforcement Learning

2 October 2021
Tong Zhang
ArXivPDFHTML

Papers citing "Feel-Good Thompson Sampling for Contextual Bandits and Reinforcement Learning"

49 / 49 papers shown
Title
Sparse Nonparametric Contextual Bandits
Sparse Nonparametric Contextual Bandits
Hamish Flynn
Julia Olkhovskaya
Paul Rognon-Vael
51
0
0
20 Mar 2025
Decision Making in Hybrid Environments: A Model Aggregation Approach
Decision Making in Hybrid Environments: A Model Aggregation Approach
Haolin Liu
Chen-Yu Wei
Julian Zimmert
86
0
0
09 Feb 2025
An Information-Theoretic Analysis of Thompson Sampling with Infinite Action Spaces
An Information-Theoretic Analysis of Thompson Sampling with Infinite Action Spaces
Amaury Gouverneur
Borja Rodríguez Gálvez
T. Oechtering
Mikael Skoglund
51
0
0
04 Feb 2025
How Does Variance Shape the Regret in Contextual Bandits?
How Does Variance Shape the Regret in Contextual Bandits?
Zeyu Jia
Jian Qian
Alexander Rakhlin
Chen-Yu Wei
35
4
0
16 Oct 2024
Neural Dueling Bandits: Preference-Based Optimization with Human Feedback
Neural Dueling Bandits: Preference-Based Optimization with Human Feedback
Arun Verma
Zhongxiang Dai
Xiaoqiang Lin
P. Jaillet
K. H. Low
32
5
0
24 Jul 2024
A Unified Confidence Sequence for Generalized Linear Models, with Applications to Bandits
A Unified Confidence Sequence for Generalized Linear Models, with Applications to Bandits
Junghyun Lee
Se-Young Yun
Kwang-Sung Jun
30
4
0
19 Jul 2024
Random Latent Exploration for Deep Reinforcement Learning
Random Latent Exploration for Deep Reinforcement Learning
Srinath Mahankali
Zhang-Wei Hong
Ayush Sekhari
Alexander Rakhlin
Pulkit Agrawal
33
3
0
18 Jul 2024
More Efficient Randomized Exploration for Reinforcement Learning via
  Approximate Sampling
More Efficient Randomized Exploration for Reinforcement Learning via Approximate Sampling
Haque Ishfaq
Yixin Tan
Yu Yang
Qingfeng Lan
Jianfeng Lu
A. Rupam Mahmood
Doina Precup
Pan Xu
32
4
0
18 Jun 2024
From Words to Actions: Unveiling the Theoretical Underpinnings of
  LLM-Driven Autonomous Systems
From Words to Actions: Unveiling the Theoretical Underpinnings of LLM-Driven Autonomous Systems
Jianliang He
Siyu Chen
Fengzhuo Zhang
Zhuoran Yang
LM&Ro
LLMAG
40
2
0
30 May 2024
Feel-Good Thompson Sampling for Contextual Dueling Bandits
Feel-Good Thompson Sampling for Contextual Dueling Bandits
Xuheng Li
Heyang Zhao
Quanquan Gu
40
8
0
09 Apr 2024
Prior-dependent analysis of posterior sampling reinforcement learning
  with function approximation
Prior-dependent analysis of posterior sampling reinforcement learning with function approximation
Yingru Li
Zhi-Quan Luo
24
1
0
17 Mar 2024
Regret Minimization via Saddle Point Optimization
Regret Minimization via Saddle Point Optimization
Johannes Kirschner
Seyed Alireza Bakhtiari
Kushagra Chandak
Volodymyr Tkachuk
Csaba Szepesvári
31
1
0
15 Mar 2024
Optimistic Information Directed Sampling
Optimistic Information Directed Sampling
Gergely Neu
Matteo Papini
Ludovic Schwartz
42
2
0
23 Feb 2024
Contextual Multinomial Logit Bandits with General Value Functions
Contextual Multinomial Logit Bandits with General Value Functions
Mengxiao Zhang
Haipeng Luo
19
1
0
12 Feb 2024
Online Iterative Reinforcement Learning from Human Feedback with General
  Preference Model
Online Iterative Reinforcement Learning from Human Feedback with General Preference Model
Chen Ye
Wei Xiong
Yuheng Zhang
Nan Jiang
Tong Zhang
OffRL
38
9
0
11 Feb 2024
Optimistic Thompson Sampling for No-Regret Learning in Unknown Games
Optimistic Thompson Sampling for No-Regret Learning in Unknown Games
Yingru Li
Liangqi Liu
Wenqiang Pu
Hao Liang
Zhi-Quan Luo
29
2
0
07 Feb 2024
Q-Star Meets Scalable Posterior Sampling: Bridging Theory and Practice
  via HyperAgent
Q-Star Meets Scalable Posterior Sampling: Bridging Theory and Practice via HyperAgent
Yingru Li
Jiawei Xu
Lei Han
Zhi-Quan Luo
BDL
OffRL
23
6
0
05 Feb 2024
On Sample-Efficient Offline Reinforcement Learning: Data Diversity,
  Posterior Sampling, and Beyond
On Sample-Efficient Offline Reinforcement Learning: Data Diversity, Posterior Sampling, and Beyond
Thanh Nguyen-Tang
Raman Arora
OffRL
30
3
0
06 Jan 2024
Iterative Preference Learning from Human Feedback: Bridging Theory and
  Practice for RLHF under KL-Constraint
Iterative Preference Learning from Human Feedback: Bridging Theory and Practice for RLHF under KL-Constraint
Wei Xiong
Hanze Dong
Chen Ye
Ziqi Wang
Han Zhong
Heng Ji
Nan Jiang
Tong Zhang
OffRL
38
157
0
18 Dec 2023
Posterior Sampling with Delayed Feedback for Reinforcement Learning with
  Linear Function Approximation
Posterior Sampling with Delayed Feedback for Reinforcement Learning with Linear Function Approximation
Nikki Lijing Kuang
Ming Yin
Mengdi Wang
Yu-Xiang Wang
Yian Ma
24
6
0
29 Oct 2023
Online Learning in Contextual Second-Price Pay-Per-Click Auctions
Online Learning in Contextual Second-Price Pay-Per-Click Auctions
Mengxiao Zhang
Haipeng Luo
27
4
0
08 Oct 2023
Bayesian Design Principles for Frequentist Sequential Learning
Bayesian Design Principles for Frequentist Sequential Learning
Yunbei Xu
A. Zeevi
24
11
0
01 Oct 2023
Bypassing the Simulator: Near-Optimal Adversarial Linear Contextual
  Bandits
Bypassing the Simulator: Near-Optimal Adversarial Linear Contextual Bandits
Haolin Liu
Chen-Yu Wei
Julian Zimmert
30
9
0
02 Sep 2023
VITS : Variational Inference Thompson Sampling for contextual bandits
VITS : Variational Inference Thompson Sampling for contextual bandits
Pierre Clavier
Tom Huix
Alain Durmus
25
3
0
19 Jul 2023
Efficient Model-Free Exploration in Low-Rank MDPs
Efficient Model-Free Exploration in Low-Rank MDPs
Zakaria Mhammedi
Adam Block
Dylan J. Foster
Alexander Rakhlin
OffRL
19
13
0
08 Jul 2023
Langevin Thompson Sampling with Logarithmic Communication: Bandits and
  Reinforcement Learning
Langevin Thompson Sampling with Logarithmic Communication: Bandits and Reinforcement Learning
Amin Karbasi
Nikki Lijing Kuang
Yi-An Ma
Siddharth Mitra
OffRL
30
5
0
15 Jun 2023
Maximize to Explore: One Objective Function Fusing Estimation, Planning,
  and Exploration
Maximize to Explore: One Objective Function Fusing Estimation, Planning, and Exploration
Zhihan Liu
Miao Lu
Wei Xiong
Han Zhong
Haotian Hu
Shenao Zhang
Sirui Zheng
Zhuoran Yang
Zhaoran Wang
OffRL
32
22
0
29 May 2023
Provable and Practical: Efficient Exploration in Reinforcement Learning
  via Langevin Monte Carlo
Provable and Practical: Efficient Exploration in Reinforcement Learning via Langevin Monte Carlo
Haque Ishfaq
Qingfeng Lan
Pan Xu
A. R. Mahmood
Doina Precup
Anima Anandkumar
Kamyar Azizzadenesheli
BDL
OffRL
26
20
0
29 May 2023
Thompson Sampling Regret Bounds for Contextual Bandits with sub-Gaussian
  rewards
Thompson Sampling Regret Bounds for Contextual Bandits with sub-Gaussian rewards
Amaury Gouverneur
Borja Rodríguez Gálvez
T. Oechtering
Mikael Skoglund
24
4
0
26 Apr 2023
Initial Task Allocation for Multi-Human Multi-Robot Teams with
  Attention-based Deep Reinforcement Learning
Initial Task Allocation for Multi-Human Multi-Robot Teams with Attention-based Deep Reinforcement Learning
Ruiqi Wang
Dezhong Zhao
Byung-Cheol Min
22
12
0
04 Mar 2023
Leveraging Demonstrations to Improve Online Learning: Quality Matters
Leveraging Demonstrations to Improve Online Learning: Quality Matters
Botao Hao
Rahul Jain
Tor Lattimore
Benjamin Van Roy
Zheng Wen
18
8
0
07 Feb 2023
Overcoming Prior Misspecification in Online Learning to Rank
Overcoming Prior Misspecification in Online Learning to Rank
Javad Azizi
Ofer Meshi
M. Zoghi
Maryam Karimzadehgan
15
1
0
25 Jan 2023
Corruption-Robust Algorithms with Uncertainty Weighting for Nonlinear
  Contextual Bandits and Markov Decision Processes
Corruption-Robust Algorithms with Uncertainty Weighting for Nonlinear Contextual Bandits and Markov Decision Processes
Chen Ye
Wei Xiong
Quanquan Gu
Tong Zhang
23
29
0
12 Dec 2022
Eluder-based Regret for Stochastic Contextual MDPs
Eluder-based Regret for Stochastic Contextual MDPs
Orin Levy
Asaf B. Cassel
Alon Cohen
Yishay Mansour
28
5
0
27 Nov 2022
Model-Free Reinforcement Learning with the Decision-Estimation
  Coefficient
Model-Free Reinforcement Learning with the Decision-Estimation Coefficient
Dylan J. Foster
Noah Golowich
Jian Qian
Alexander Rakhlin
Ayush Sekhari
OffRL
30
9
0
25 Nov 2022
A Self-Play Posterior Sampling Algorithm for Zero-Sum Markov Games
A Self-Play Posterior Sampling Algorithm for Zero-Sum Markov Games
Wei Xiong
Han Zhong
Chengshuai Shi
Cong Shen
Tong Zhang
60
18
0
04 Oct 2022
Optimistic Posterior Sampling for Reinforcement Learning with Few
  Samples and Tight Guarantees
Optimistic Posterior Sampling for Reinforcement Learning with Few Samples and Tight Guarantees
D. Tiapkin
Denis Belomestny
Daniele Calandriello
Eric Moulines
Rémi Munos
A. Naumov
Mark Rowland
Michal Valko
Pierre Menard
36
8
0
28 Sep 2022
A Provably Efficient Model-Free Posterior Sampling Method for Episodic
  Reinforcement Learning
A Provably Efficient Model-Free Posterior Sampling Method for Episodic Reinforcement Learning
Christoph Dann
M. Mohri
Tong Zhang
Julian Zimmert
OffRL
16
32
0
23 Aug 2022
Contextual Bandits with Large Action Spaces: Made Practical
Contextual Bandits with Large Action Spaces: Made Practical
Yinglun Zhu
Dylan J. Foster
John Langford
Paul Mineiro
26
29
0
12 Jul 2022
Guarantees for Epsilon-Greedy Reinforcement Learning with Function
  Approximation
Guarantees for Epsilon-Greedy Reinforcement Learning with Function Approximation
Christoph Dann
Yishay Mansour
M. Mohri
Ayush Sekhari
Karthik Sridharan
21
49
0
19 Jun 2022
Model-based RL with Optimistic Posterior Sampling: Structural Conditions
  and Sample Complexity
Model-based RL with Optimistic Posterior Sampling: Structural Conditions and Sample Complexity
Alekh Agarwal
Tong Zhang
39
22
0
15 Jun 2022
Regret Bounds for Information-Directed Reinforcement Learning
Regret Bounds for Information-Directed Reinforcement Learning
Botao Hao
Tor Lattimore
OffRL
39
17
0
09 Jun 2022
Finite-Time Regret of Thompson Sampling Algorithms for Exponential
  Family Multi-Armed Bandits
Finite-Time Regret of Thompson Sampling Algorithms for Exponential Family Multi-Armed Bandits
Tianyuan Jin
Pan Xu
X. Xiao
Anima Anandkumar
36
12
0
07 Jun 2022
Bandit Theory and Thompson Sampling-Guided Directed Evolution for
  Sequence Optimization
Bandit Theory and Thompson Sampling-Guided Directed Evolution for Sequence Optimization
Hui Yuan
Chengzhuo Ni
Huazheng Wang
Xuezhou Zhang
Le Cong
Csaba Szepesvári
Mengdi Wang
20
2
0
05 Jun 2022
Lifting the Information Ratio: An Information-Theoretic Analysis of
  Thompson Sampling for Contextual Bandits
Lifting the Information Ratio: An Information-Theoretic Analysis of Thompson Sampling for Contextual Bandits
Gergely Neu
Julia Olkhovskaya
Matteo Papini
Ludovic Schwartz
31
16
0
27 May 2022
Non-Linear Reinforcement Learning in Large Action Spaces: Structural
  Conditions and Sample-efficiency of Posterior Sampling
Non-Linear Reinforcement Learning in Large Action Spaces: Structural Conditions and Sample-efficiency of Posterior Sampling
Alekh Agarwal
Tong Zhang
24
8
0
15 Mar 2022
Improved Worst-Case Regret Bounds for Randomized Least-Squares Value
  Iteration
Improved Worst-Case Regret Bounds for Randomized Least-Squares Value Iteration
Priyank Agrawal
Jinglin Chen
Nan Jiang
24
18
0
23 Oct 2020
Nonstationary Reinforcement Learning with Linear Function Approximation
Nonstationary Reinforcement Learning with Linear Function Approximation
Huozhi Zhou
Jinglin Chen
L. Varshney
A. Jagmohan
37
30
0
08 Oct 2020
TS-UCB: Improving on Thompson Sampling With Little to No Additional
  Computation
TS-UCB: Improving on Thompson Sampling With Little to No Additional Computation
Jackie Baek
Vivek F. Farias
30
9
0
11 Jun 2020
1