ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1908.01289
  4. Cited By
Dueling Posterior Sampling for Preference-Based Reinforcement Learning

Dueling Posterior Sampling for Preference-Based Reinforcement Learning

4 August 2019
Ellen R. Novoseller
Yibing Wei
Yanan Sui
Yisong Yue
J. W. Burdick
ArXivPDFHTML

Papers citing "Dueling Posterior Sampling for Preference-Based Reinforcement Learning"

13 / 13 papers shown
Title
Contextual Online Uncertainty-Aware Preference Learning for Human Feedback
Contextual Online Uncertainty-Aware Preference Learning for Human Feedback
Nan Lu
Ethan X. Fang
Junwei Lu
60
0
0
27 Apr 2025
Reinforcement Learning from Multi-level and Episodic Human Feedback
Reinforcement Learning from Multi-level and Episodic Human Feedback
Muhammad Qasim Elahi
Somtochukwu Oguchienti
Maheed H. Ahmed
Mahsa Ghasemi
OffRL
44
0
0
20 Apr 2025
Can RLHF be More Efficient with Imperfect Reward Models? A Policy Coverage Perspective
Can RLHF be More Efficient with Imperfect Reward Models? A Policy Coverage Perspective
Jiawei Huang
Bingcong Li
Christoph Dann
Niao He
OffRL
74
0
0
26 Feb 2025
Preference-Guided Reinforcement Learning for Efficient Exploration
Preference-Guided Reinforcement Learning for Efficient Exploration
Guojian Wang
Faguo Wu
Xiao Zhang
Tianyuan Chen
Xuyang Chen
Lin Zhao
25
0
0
09 Jul 2024
Reinforcement Learning from Human Feedback without Reward Inference: Model-Free Algorithm and Instance-Dependent Analysis
Reinforcement Learning from Human Feedback without Reward Inference: Model-Free Algorithm and Instance-Dependent Analysis
Qining Zhang
Honghao Wei
Lei Ying
OffRL
38
1
0
11 Jun 2024
Comparisons Are All You Need for Optimizing Smooth Functions
Comparisons Are All You Need for Optimizing Smooth Functions
Chenyi Zhang
Tongyang Li
AAML
21
1
0
19 May 2024
Iterative Data Smoothing: Mitigating Reward Overfitting and
  Overoptimization in RLHF
Iterative Data Smoothing: Mitigating Reward Overfitting and Overoptimization in RLHF
Banghua Zhu
Michael I. Jordan
Jiantao Jiao
21
22
0
29 Jan 2024
A Minimaximalist Approach to Reinforcement Learning from Human Feedback
A Minimaximalist Approach to Reinforcement Learning from Human Feedback
Gokul Swamy
Christoph Dann
Rahul Kidambi
Zhiwei Steven Wu
Alekh Agarwal
OffRL
22
94
0
08 Jan 2024
A Learning-Based Framework for Safe Human-Robot Collaboration with
  Multiple Backup Control Barrier Functions
A Learning-Based Framework for Safe Human-Robot Collaboration with Multiple Backup Control Barrier Functions
Neil C. Janwani
Ersin Daş
Thomas Touma
Skylar X. Wei
Tamas G. Molnar
J. W. Burdick
16
2
0
09 Oct 2023
Reinforcement Learning with Human Feedback: Learning Dynamic Choices via
  Pessimism
Reinforcement Learning with Human Feedback: Learning Dynamic Choices via Pessimism
Zihao Li
Zhuoran Yang
Mengdi Wang
OffRL
16
51
0
29 May 2023
Learning from Imperfect Demonstrations via Adversarial Confidence
  Transfer
Learning from Imperfect Demonstrations via Adversarial Confidence Transfer
Zhangjie Cao
Zihan Wang
Dorsa Sadigh
AAML
19
7
0
07 Feb 2022
Early Detection of Combustion Instabilities using Deep Convolutional
  Selective Autoencoders on Hi-speed Flame Video
Early Detection of Combustion Instabilities using Deep Convolutional Selective Autoencoders on Hi-speed Flame Video
Chandrayee Basu
Qian Yang
M. Singhal
Anca Dragan
49
174
0
25 Mar 2016
Online Structured Prediction via Coactive Learning
Online Structured Prediction via Coactive Learning
Pannagadatta K. Shivaswamy
Thorsten Joachims
HAI
63
66
0
18 May 2012
1