ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2406.09574
  4. Cited By
Online Bandit Learning with Offline Preference Data

Online Bandit Learning with Offline Preference Data

13 June 2024
Akhil Agnihotri
Rahul Jain
Deepak Ramachandran
Zheng Wen
    OffRL
ArXivPDFHTML

Papers citing "Online Bandit Learning with Offline Preference Data"

2 / 2 papers shown
Title
e-COP : Episodic Constrained Optimization of Policies
e-COP : Episodic Constrained Optimization of Policies
Akhil Agnihotri
Rahul Jain
Deepak Ramachandran
Sahil Singla
OffRL
27
0
0
13 Jun 2024
Artificial Replay: A Meta-Algorithm for Harnessing Historical Data in Bandits
Artificial Replay: A Meta-Algorithm for Harnessing Historical Data in Bandits
Siddhartha Banerjee
Sean R. Sinclair
Milind Tambe
Lily Xu
C. Yu
AI4TS
29
6
0
30 Sep 2022
1