ResearchTrend.AI
  • Papers
  • Communities
  • Organizations
  • Events
  • Blog
  • Pricing
  • Feedback
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1901.00301
  4. Cited By
Warm-starting Contextual Bandits: Robustly Combining Supervised and
  Bandit Feedback
v1v2 (latest)

Warm-starting Contextual Bandits: Robustly Combining Supervised and Bandit Feedback

2 January 2019
Chicheng Zhang
Alekh Agarwal
Hal Daumé
John Langford
S. Negahban
ArXiv (abs)PDFHTML

Papers citing "Warm-starting Contextual Bandits: Robustly Combining Supervised and Bandit Feedback"

11 / 11 papers shown
Title
Augmenting Online RL with Offline Data is All You Need: A Unified Hybrid RL Algorithm Design and Analysis
Augmenting Online RL with Offline Data is All You Need: A Unified Hybrid RL Algorithm Design and Analysis
Ruiquan Huang
Donghao Li
Chengshuai Shi
Cong Shen
Jing Yang
OffRL
174
0
0
01 Jul 2025
Multi-Armed Bandits With Machine Learning-Generated Surrogate Rewards
Multi-Armed Bandits With Machine Learning-Generated Surrogate Rewards
Wenlong Ji
Yihan Pan
Ruihao Zhu
Lihua Lei
72
1
0
20 Jun 2025
Best Arm Identification with Possibly Biased Offline Data
Best Arm Identification with Possibly Biased Offline Data
Le Yang
Vincent Y. F. Tan
Wang Chi Cheung
91
1
0
29 May 2025
Deconfounded Warm-Start Thompson Sampling with Applications to Precision Medicine
Deconfounded Warm-Start Thompson Sampling with Applications to Precision Medicine
Prateek Jaiswal
Esmaeil Keyvanshokooh
Junyu Cao
115
0
0
22 May 2025
Warm Starting of CMA-ES for Contextual Optimization Problems
Warm Starting of CMA-ES for Contextual Optimization Problems
Yuta Sekino
Kento Uchida
Shinichi Shirakawa
188
0
0
18 Feb 2025
Online Bandit Learning with Offline Preference Data for Improved RLHF
Online Bandit Learning with Offline Preference Data for Improved RLHF
Akhil Agnihotri
Rahul Jain
Deepak Ramachandran
Zheng Wen
OffRL
304
3
0
13 Jun 2024
Leveraging User-Triggered Supervision in Contextual Bandits
Leveraging User-Triggered Supervision in Contextual Bandits
Alekh Agarwal
Claudio Gentile
T. V. Marinov
61
0
0
07 Feb 2023
Artificial Replay: A Meta-Algorithm for Harnessing Historical Data in Bandits
Artificial Replay: A Meta-Algorithm for Harnessing Historical Data in Bandits
Siddhartha Banerjee
Sean R. Sinclair
Milind Tambe
Lily Xu
Chao Yu
AI4TS
268
8
0
30 Sep 2022
Thompson Sampling for Robust Transfer in Multi-Task Bandits
Thompson Sampling for Robust Transfer in Multi-Task Bandits
Zhi Wang
Chicheng Zhang
Kamalika Chaudhuri
AAML
96
7
0
17 Jun 2022
Multitask Bandit Learning Through Heterogeneous Feedback Aggregation
Multitask Bandit Learning Through Heterogeneous Feedback Aggregation
Zhi Wang
Chicheng Zhang
Manish Singh
L. Riek
Kamalika Chaudhuri
194
25
0
29 Oct 2020
Combining Offline Causal Inference and Online Bandit Learning for Data
  Driven Decision
Combining Offline Causal Inference and Online Bandit Learning for Data Driven Decision
Li Ye
Yishi Lin
Hong Xie
John C. S. Lui
CML
115
12
0
16 Jan 2020
1