ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2111.03026
  4. Cited By
B-Pref: Benchmarking Preference-Based Reinforcement Learning

B-Pref: Benchmarking Preference-Based Reinforcement Learning

4 November 2021
Kimin Lee
Laura M. Smith
Anca Dragan
Pieter Abbeel
    OffRL
ArXivPDFHTML

Papers citing "B-Pref: Benchmarking Preference-Based Reinforcement Learning"

50 / 69 papers shown
Title
TREND: Tri-teaching for Robust Preference-based Reinforcement Learning with Demonstrations
TREND: Tri-teaching for Robust Preference-based Reinforcement Learning with Demonstrations
Shuaiyi Huang
Mara Levy
Anubhav Gupta
Daniel Ekpo
Ruijie Zheng
Abhinav Shrivastava
23
0
0
09 May 2025
Online Preference-based Reinforcement Learning with Self-augmented
  Feedback from Large Language Model
Online Preference-based Reinforcement Learning with Self-augmented Feedback from Large Language Model
Songjun Tu
Jingbo Sun
Qichao Zhang
Xiangyuan Lan
Dongbin Zhao
67
1
0
22 Dec 2024
Navigating Noisy Feedback: Enhancing Reinforcement Learning with
  Error-Prone Language Models
Navigating Noisy Feedback: Enhancing Reinforcement Learning with Error-Prone Language Models
Muhan Lin
Shuyang Shi
Yue (Sophie) Guo
Behdad Chalaki
Vaishnav Tadiparthi
Ehsan Moradi-Pari
Simon Stepputtis
Joseph Campbell
Katia P. Sycara
33
1
0
22 Oct 2024
UNIQ: Offline Inverse Q-learning for Avoiding Undesirable Demonstrations
UNIQ: Offline Inverse Q-learning for Avoiding Undesirable Demonstrations
Huy Hoang
Tien Mai
Pradeep Varakantham
OffRL
22
0
0
10 Oct 2024
Reinforcement Learning From Imperfect Corrective Actions And Proxy
  Rewards
Reinforcement Learning From Imperfect Corrective Actions And Proxy Rewards
Zhaohui Jiang
Xuening Feng
Paul Weng
Yifei Zhu
Yan Song
Tianze Zhou
Yujing Hu
Tangjie Lv
Changjie Fan
31
0
0
08 Oct 2024
Human-Robot Cooperative Distribution Coupling for Hamiltonian-Constrained Social Navigation
Human-Robot Cooperative Distribution Coupling for Hamiltonian-Constrained Social Navigation
Weizheng Wang
Chao Yu
Yu Wang
Byung-Cheol Min
57
2
0
20 Sep 2024
Multi-Type Preference Learning: Empowering Preference-Based
  Reinforcement Learning with Equal Preferences
Multi-Type Preference Learning: Empowering Preference-Based Reinforcement Learning with Equal Preferences
Z. Liu
Junjie Xu
Xingjiao Wu
J. Yang
Liang He
26
0
0
11 Sep 2024
Forward KL Regularized Preference Optimization for Aligning Diffusion
  Policies
Forward KL Regularized Preference Optimization for Aligning Diffusion Policies
Zhao Shan
Chenyou Fan
Shuang Qiu
Jiyuan Shi
Chenjia Bai
33
4
0
09 Sep 2024
ELO-Rated Sequence Rewards: Advancing Reinforcement Learning Models
ELO-Rated Sequence Rewards: Advancing Reinforcement Learning Models
Qi Ju
Falin Hei
Zhemei Fang
Yunfeng Luo
14
0
0
05 Sep 2024
Advances in Preference-based Reinforcement Learning: A Review
Advances in Preference-based Reinforcement Learning: A Review
Youssef Abdelkareem
Shady Shehata
Fakhri Karray
OffRL
35
9
0
21 Aug 2024
Listwise Reward Estimation for Offline Preference-based Reinforcement
  Learning
Listwise Reward Estimation for Offline Preference-based Reinforcement Learning
Heewoong Choi
Sangwon Jung
Hongjoon Ahn
Taesup Moon
OffRL
34
2
0
08 Aug 2024
Aligning Diffusion Behaviors with Q-functions for Efficient Continuous
  Control
Aligning Diffusion Behaviors with Q-functions for Efficient Continuous Control
Huayu Chen
Kaiwen Zheng
Hang Su
Jun Zhu
32
1
0
12 Jul 2024
Robust Reinforcement Learning from Corrupted Human Feedback
Robust Reinforcement Learning from Corrupted Human Feedback
Alexander Bukharin
Ilgee Hong
Haoming Jiang
Zichong Li
Qingru Zhang
Zixuan Zhang
Tuo Zhao
31
4
0
21 Jun 2024
Pareto-Optimal Learning from Preferences with Hidden Context
Pareto-Optimal Learning from Preferences with Hidden Context
Ryan Boldi
Li Ding
Lee Spector
S. Niekum
54
6
0
21 Jun 2024
RRLS : Robust Reinforcement Learning Suite
RRLS : Robust Reinforcement Learning Suite
Adil Zouitine
David Bertoin
Pierre Clavier
Matthieu Geist
Emmanuel Rachelson
OffRL
22
1
0
12 Jun 2024
Boosting Robustness in Preference-Based Reinforcement Learning with
  Dynamic Sparsity
Boosting Robustness in Preference-Based Reinforcement Learning with Dynamic Sparsity
Calarina Muslimani
Bram Grooten
Deepak Ranganatha Sastry Mamillapalli
Mykola Pechenizkiy
D. Mocanu
M. E. Taylor
43
0
0
10 Jun 2024
Direct Preference Optimization With Unobserved Preference Heterogeneity
Direct Preference Optimization With Unobserved Preference Heterogeneity
Keertana Chidambaram
Karthik Vinay Seetharaman
Vasilis Syrgkanis
36
7
0
23 May 2024
Leveraging Sub-Optimal Data for Human-in-the-Loop Reinforcement Learning
Leveraging Sub-Optimal Data for Human-in-the-Loop Reinforcement Learning
Calarina Muslimani
M. E. Taylor
OffRL
38
2
0
30 Apr 2024
Optimal Design for Human Feedback
Optimal Design for Human Feedback
Subhojyoti Mukherjee
Anusha Lalitha
Kousha Kalantari
Aniket Deshmukh
Ge Liu
Yifei Ma
B. Kveton
31
0
0
22 Apr 2024
Impact of Preference Noise on the Alignment Performance of Generative
  Language Models
Impact of Preference Noise on the Alignment Performance of Generative Language Models
Yang Gao
Dana Alon
Donald Metzler
21
15
0
15 Apr 2024
Hindsight PRIORs for Reward Learning from Human Preferences
Hindsight PRIORs for Reward Learning from Human Preferences
Mudit Verma
Katherine Metcalf
35
5
0
12 Apr 2024
Learning Human Preferences Over Robot Behavior as Soft Planning
  Constraints
Learning Human Preferences Over Robot Behavior as Soft Planning Constraints
Austin Narcomey
Nathan Tsoi
Ruta Desai
Marynel Vázquez
36
3
0
28 Mar 2024
Online Policy Learning from Offline Preferences
Online Policy Learning from Offline Preferences
Guoxi Zhang
Han Bao
Hisashi Kashima
OffRL
27
0
0
15 Mar 2024
Sample-Efficient Preference-based Reinforcement Learning with Dynamics
  Aware Rewards
Sample-Efficient Preference-based Reinforcement Learning with Dynamics Aware Rewards
Katherine Metcalf
Miguel Sarabia
Natalie Mackraz
B. Theobald
19
5
0
28 Feb 2024
RIME: Robust Preference-based Reinforcement Learning with Noisy
  Preferences
RIME: Robust Preference-based Reinforcement Learning with Noisy Preferences
Jie Cheng
Gang Xiong
Xingyuan Dai
Q. Miao
Yisheng Lv
Fei-Yue Wang
26
14
0
27 Feb 2024
Batch Active Learning of Reward Functions from Human Preferences
Batch Active Learning of Reward Functions from Human Preferences
Erdem Biyik
Nima Anari
Dorsa Sadigh
27
6
0
24 Feb 2024
MENTOR: Guiding Hierarchical Reinforcement Learning with Human Feedback
  and Dynamic Distance Constraint
MENTOR: Guiding Hierarchical Reinforcement Learning with Human Feedback and Dynamic Distance Constraint
Xinglin Zhou
Yifu Yuan
Shaofu Yang
Jianye Hao
21
1
0
22 Feb 2024
Corruption Robust Offline Reinforcement Learning with Human Feedback
Corruption Robust Offline Reinforcement Learning with Human Feedback
Debmalya Mandal
Andi Nika
Parameswaran Kamalaruban
Adish Singla
Goran Radanović
OffRL
28
8
0
09 Feb 2024
RL-VLM-F: Reinforcement Learning from Vision Language Foundation Model
  Feedback
RL-VLM-F: Reinforcement Learning from Vision Language Foundation Model Feedback
Yufei Wang
Zhanyi Sun
Jesse Zhang
Zhou Xian
Erdem Biyik
David Held
Zackory M. Erickson
VLM
53
48
0
06 Feb 2024
Uni-RLHF: Universal Platform and Benchmark Suite for Reinforcement
  Learning with Diverse Human Feedback
Uni-RLHF: Universal Platform and Benchmark Suite for Reinforcement Learning with Diverse Human Feedback
Yifu Yuan
Jianye Hao
Yi-An Ma
Zibin Dong
Hebin Liang
Jinyi Liu
Zhixin Feng
Kai-Wen Zhao
Yan Zheng
OffRL
ALM
11
14
0
04 Feb 2024
Crowd-PrefRL: Preference-Based Reward Learning from Crowds
Crowd-PrefRL: Preference-Based Reward Learning from Crowds
David Chhan
Ellen R. Novoseller
Vernon J. Lawhern
27
5
0
17 Jan 2024
A Minimaximalist Approach to Reinforcement Learning from Human Feedback
A Minimaximalist Approach to Reinforcement Learning from Human Feedback
Gokul Swamy
Christoph Dann
Rahul Kidambi
Zhiwei Steven Wu
Alekh Agarwal
OffRL
28
94
0
08 Jan 2024
Distributional Preference Learning: Understanding and Accounting for
  Hidden Context in RLHF
Distributional Preference Learning: Understanding and Accounting for Hidden Context in RLHF
Anand Siththaranjan
Cassidy Laidlaw
Dylan Hadfield-Menell
13
54
0
13 Dec 2023
Agent-Aware Training for Agent-Agnostic Action Advising in Deep
  Reinforcement Learning
Agent-Aware Training for Agent-Agnostic Action Advising in Deep Reinforcement Learning
Yaoquan Wei
Shunyu Liu
Jie Song
Tongya Zheng
Kaixuan Chen
Yong Wang
Mingli Song
11
0
0
28 Nov 2023
Autonomous Robotic Reinforcement Learning with Asynchronous Human
  Feedback
Autonomous Robotic Reinforcement Learning with Asynchronous Human Feedback
Max Balsells
M. Torné
Zihan Wang
Samedh Desai
Pulkit Agrawal
Abhishek Gupta
37
10
0
31 Oct 2023
Learning Reward for Physical Skills using Large Language Model
Learning Reward for Physical Skills using Large Language Model
Yuwei Zeng
Yiqing Xu
23
6
0
21 Oct 2023
Dynamic value alignment through preference aggregation of multiple
  objectives
Dynamic value alignment through preference aggregation of multiple objectives
Marcin Korecki
Damian Dailisan
Cesare Carissimo
20
0
0
09 Oct 2023
Learning Optimal Advantage from Preferences and Mistaking it for Reward
Learning Optimal Advantage from Preferences and Mistaking it for Reward
W. B. Knox
Stephane Hatgis-Kessell
Sigurdur O. Adalgeirsson
Serena Booth
Anca D. Dragan
Peter Stone
S. Niekum
14
12
0
03 Oct 2023
Motif: Intrinsic Motivation from Artificial Intelligence Feedback
Motif: Intrinsic Motivation from Artificial Intelligence Feedback
Martin Klissarov
P. DÓro
Shagun Sodhani
Roberta Raileanu
Pierre-Luc Bacon
Pascal Vincent
Amy Zhang
Mikael Henaff
LRM
LLMAG
16
54
0
29 Sep 2023
Rating-based Reinforcement Learning
Rating-based Reinforcement Learning
Devin White
Mingkang Wu
Ellen R. Novoseller
Vernon J. Lawhern
Nicholas R. Waytowich
Yongcan Cao
ALM
11
6
0
30 Jul 2023
STRAPPER: Preference-based Reinforcement Learning via Self-training
  Augmentation and Peer Regularization
STRAPPER: Preference-based Reinforcement Learning via Self-training Augmentation and Peer Regularization
Yachen Kang
Li He
Jinxin Liu
Zifeng Zhuang
Donglin Wang
20
0
0
19 Jul 2023
Boosting Feedback Efficiency of Interactive Reinforcement Learning by
  Adaptive Learning from Scores
Boosting Feedback Efficiency of Interactive Reinforcement Learning by Adaptive Learning from Scores
Shukai Liu
Chenming Wu
Ying Li
Liang Zhang
16
0
0
11 Jul 2023
PEARL: Zero-shot Cross-task Preference Alignment and Robust Reward
  Learning for Robotic Manipulation
PEARL: Zero-shot Cross-task Preference Alignment and Robust Reward Learning for Robotic Manipulation
Runze Liu
Yali Du
Fengshuo Bai
Jiafei Lyu
Xiu Li
11
6
0
06 Jun 2023
Query-Policy Misalignment in Preference-Based Reinforcement Learning
Query-Policy Misalignment in Preference-Based Reinforcement Learning
Xiao Hu
Jianxiong Li
Xianyuan Zhan
Qing-Shan Jia
Ya-Qin Zhang
11
8
0
27 May 2023
Learning Interpretable Models of Aircraft Handling Behaviour by
  Reinforcement Learning from Human Feedback
Learning Interpretable Models of Aircraft Handling Behaviour by Reinforcement Learning from Human Feedback
Tom Bewley
J. Lawry
Arthur G. Richards
25
1
0
26 May 2023
Inverse Preference Learning: Preference-based RL without a Reward
  Function
Inverse Preference Learning: Preference-based RL without a Reward Function
Joey Hejna
Dorsa Sadigh
OffRL
19
47
0
24 May 2023
Maximum Causal Entropy Inverse Constrained Reinforcement Learning
Maximum Causal Entropy Inverse Constrained Reinforcement Learning
Mattijs Baert
Pietro Mazzaglia
Sam Leroux
Pieter Simoens
CML
27
10
0
04 May 2023
NaviSTAR: Socially Aware Robot Navigation with Hybrid Spatio-Temporal
  Graph Transformer and Preference Learning
NaviSTAR: Socially Aware Robot Navigation with Hybrid Spatio-Temporal Graph Transformer and Preference Learning
Weizheng Wang
Ruiqi Wang
Le Mao
Byung-Cheol Min
32
13
0
12 Apr 2023
Preference Transformer: Modeling Human Preferences using Transformers
  for RL
Preference Transformer: Modeling Human Preferences using Transformers for RL
Changyeon Kim
Jongjin Park
Jinwoo Shin
Honglak Lee
Pieter Abbeel
Kimin Lee
OffRL
23
60
0
02 Mar 2023
Complex QA and language models hybrid architectures, Survey
Complex QA and language models hybrid architectures, Survey
Xavier Daull
P. Bellot
Emmanuel Bruno
Vincent Martin
Elisabeth Murisasco
ELM
19
15
0
17 Feb 2023
12
Next