ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1209.3352
  4. Cited By
Thompson Sampling for Contextual Bandits with Linear Payoffs

Thompson Sampling for Contextual Bandits with Linear Payoffs

15 September 2012
Shipra Agrawal
Navin Goyal
ArXivPDFHTML

Papers citing "Thompson Sampling for Contextual Bandits with Linear Payoffs"

19 / 19 papers shown
Title
Prompt Optimization with Logged Bandit Data
Prompt Optimization with Logged Bandit Data
Haruka Kiyohara
Daniel Yiming Cao
Yuta Saito
Thorsten Joachims
123
0
0
03 Apr 2025
Linear Bandits with Partially Observable Features
Wonyoung Hedge Kim
Sungwoo Park
G. Iyengar
A. Zeevi
Min Hwan Oh
113
1
0
10 Feb 2025
Distributed Thompson sampling under constrained communication
Distributed Thompson sampling under constrained communication
Saba Zerefa
Zhaolin Ren
Haitong Ma
Na Li
71
1
0
03 Jan 2025
AuctionNet: A Novel Benchmark for Decision-Making in Large-Scale Games
AuctionNet: A Novel Benchmark for Decision-Making in Large-Scale Games
Kefan Su
Yusen Huo
Zhilin Zhang
Shuai Dou
Chuan Yu
Jian Xu
Zongqing Lu
Bo Zheng
101
7
0
31 Dec 2024
Planning and Learning in Risk-Aware Restless Multi-Arm Bandit Problem
Planning and Learning in Risk-Aware Restless Multi-Arm Bandit Problem
Nima Akbarzadeh
Erick Delage
Yossiri Adulyasak
91
0
0
30 Oct 2024
HR-Bandit: Human-AI Collaborated Linear Recourse Bandit
HR-Bandit: Human-AI Collaborated Linear Recourse Bandit
Junyu Cao
Ruijiang Gao
Esmaeil Keyvanshokooh
102
1
0
18 Oct 2024
Second Order Bounds for Contextual Bandits with Function Approximation
Second Order Bounds for Contextual Bandits with Function Approximation
Aldo Pacchiano
142
4
0
24 Sep 2024
Neural Dueling Bandits: Preference-Based Optimization with Human Feedback
Neural Dueling Bandits: Preference-Based Optimization with Human Feedback
Arun Verma
Zhongxiang Dai
Xiaoqiang Lin
Patrick Jaillet
K. H. Low
84
5
0
24 Jul 2024
Online Bandit Learning with Offline Preference Data for Improved RLHF
Online Bandit Learning with Offline Preference Data for Improved RLHF
Akhil Agnihotri
Rahul Jain
Deepak Ramachandran
Zheng Wen
OffRL
97
2
0
13 Jun 2024
On Bits and Bandits: Quantifying the Regret-Information Trade-off
On Bits and Bandits: Quantifying the Regret-Information Trade-off
Itai Shufaro
Nadav Merlis
Nir Weinberger
Shie Mannor
94
0
0
26 May 2024
Bayesian Off-Policy Evaluation and Learning for Large Action Spaces
Bayesian Off-Policy Evaluation and Learning for Large Action Spaces
Imad Aouali
Victor-Emmanuel Brunel
David Rohde
Anna Korba
OffRL
78
5
0
22 Feb 2024
Prior-Dependent Allocations for Bayesian Fixed-Budget Best-Arm Identification in Structured Bandits
Prior-Dependent Allocations for Bayesian Fixed-Budget Best-Arm Identification in Structured Bandits
Nicolas Nguyen
Imad Aouali
András Gyorgy
Claire Vernade
56
2
0
08 Feb 2024
Ensemble sampling for linear bandits: small ensembles suffice
Ensemble sampling for linear bandits: small ensembles suffice
David Janz
A. Litvak
Csaba Szepesvári
52
1
0
14 Nov 2023
Selective Uncertainty Propagation in Offline RL
Selective Uncertainty Propagation in Offline RL
Sanath Kumar Krishnamurthy
Shrey Modi
Tanmay Gangwani
S. Katariya
Branislav Kveton
A. Rangi
OffRL
113
0
0
01 Feb 2023
Safe Linear Thompson Sampling with Side Information
Safe Linear Thompson Sampling with Side Information
Ahmadreza Moradipari
Sanae Amani
M. Alizadeh
Christos Thrampoulidis
76
42
0
06 Nov 2019
Learning to Optimize Via Posterior Sampling
Learning to Optimize Via Posterior Sampling
Daniel Russo
Benjamin Van Roy
114
697
0
11 Jan 2013
Further Optimal Regret Bounds for Thompson Sampling
Further Optimal Regret Bounds for Thompson Sampling
Shipra Agrawal
Navin Goyal
64
443
0
15 Sep 2012
Thompson Sampling: An Asymptotically Optimal Finite Time Analysis
Thompson Sampling: An Asymptotically Optimal Finite Time Analysis
E. Kaufmann
N. Korda
Rémi Munos
79
585
0
18 May 2012
Towards minimax policies for online linear optimization with bandit
  feedback
Towards minimax policies for online linear optimization with bandit feedback
Sébastien Bubeck
Nicolò Cesa-Bianchi
Sham Kakade
OffRL
109
149
0
14 Feb 2012
1