ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2307.11288
  4. Cited By
Kernelized Offline Contextual Dueling Bandits

Kernelized Offline Contextual Dueling Bandits

21 July 2023
Viraj Mehta
Ojash Neopane
Vikramjeet Das
Sen Lin
J. Schneider
W. Neiswanger
    OffRL
ArXivPDFHTML

Papers citing "Kernelized Offline Contextual Dueling Bandits"

6 / 6 papers shown
Title
Bandits with Preference Feedback: A Stackelberg Game Perspective
Bandits with Preference Feedback: A Stackelberg Game Perspective
Barna Pásztor
Parnian Kassraie
Andreas Krause
20
2
0
24 Jun 2024
Self-Play with Adversarial Critic: Provable and Scalable Offline
  Alignment for Language Models
Self-Play with Adversarial Critic: Provable and Scalable Offline Alignment for Language Models
Xiang Ji
Sanjeev Kulkarni
Mengdi Wang
Tengyang Xie
OffRL
35
4
0
06 Jun 2024
Principled Preferential Bayesian Optimization
Principled Preferential Bayesian Optimization
Wenjie Xu
Wenbin Wang
Yuning Jiang
B. Svetozarevic
Colin N. Jones
14
6
0
08 Feb 2024
Training language models to follow instructions with human feedback
Training language models to follow instructions with human feedback
Long Ouyang
Jeff Wu
Xu Jiang
Diogo Almeida
Carroll L. Wainwright
...
Amanda Askell
Peter Welinder
Paul Christiano
Jan Leike
Ryan J. Lowe
OSLM
ALM
303
11,730
0
04 Mar 2022
Fine-Tuning Language Models from Human Preferences
Fine-Tuning Language Models from Human Preferences
Daniel M. Ziegler
Nisan Stiennon
Jeff Wu
Tom B. Brown
Alec Radford
Dario Amodei
Paul Christiano
G. Irving
ALM
275
1,561
0
18 Sep 2019
Improving a Neural Semantic Parser by Counterfactual Learning from Human
  Bandit Feedback
Improving a Neural Semantic Parser by Counterfactual Learning from Human Bandit Feedback
Carolin (Haas) Lawrence
Stefan Riezler
OffRL
171
56
0
03 May 2018
1