Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2303.00957
Cited By
Preference Transformer: Modeling Human Preferences using Transformers for RL
2 March 2023
Changyeon Kim
Jongjin Park
Jinwoo Shin
Honglak Lee
Pieter Abbeel
Kimin Lee
OffRL
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Preference Transformer: Modeling Human Preferences using Transformers for RL"
50 / 53 papers shown
Title
TREND: Tri-teaching for Robust Preference-based Reinforcement Learning with Demonstrations
Shuaiyi Huang
Mara Levy
Anubhav Gupta
Daniel Ekpo
Ruijie Zheng
Abhinav Shrivastava
19
0
0
09 May 2025
Contextual Online Uncertainty-Aware Preference Learning for Human Feedback
Nan Lu
Ethan X. Fang
Junwei Lu
58
0
0
27 Apr 2025
FLoRA: Sample-Efficient Preference-based RL via Low-Rank Style Adaptation of Reward Functions
Daniel Marta
Simon Holk
Miguel Vasco
Jens Lundell
Timon Homberger
F. L. Busch
Olov Andersson
Danica Kragic
Iolanda Leite
30
0
0
14 Apr 2025
Latent Embedding Adaptation for Human Preference Alignment in Diffusion Planners
Wen Zheng Terence Ng
Jianda Chen
Yuan Xu
Tianwei Zhang
37
0
0
24 Mar 2025
Adversarial Policy Optimization for Offline Preference-based Reinforcement Learning
Hyungkyu Kang
Min-hwan Oh
OffRL
39
0
0
07 Mar 2025
Human Implicit Preference-Based Policy Fine-tuning for Multi-Agent Reinforcement Learning in USV Swarm
H. Kim
Kanghoon Lee
J. Park
Jiachen Li
Jinkyoo Park
58
1
0
05 Mar 2025
DPR: Diffusion Preference-based Reward for Offline Reinforcement Learning
Teng Pang
Bingzheng Wang
Guoqiang Wu
Yilong Yin
OffRL
65
0
0
03 Mar 2025
Behavior Preference Regression for Offline Reinforcement Learning
Padmanaba Srinivasan
William J. Knottenbelt
OffRL
31
0
0
02 Mar 2025
Segmenting Text and Learning Their Rewards for Improved RLHF in Language Model
Yueqin Yin
Shentao Yang
Yujia Xie
Ziyi Yang
Yuting Sun
Hany Awadalla
Weizhu Chen
Mingyuan Zhou
48
0
0
07 Jan 2025
LEASE: Offline Preference-based Reinforcement Learning with High Sample Efficiency
Xiao-Yin Liu
Guotao Li
Xiao-Hu Zhou
Z. Hou
OffRL
34
0
0
31 Dec 2024
Approximated Variational Bayesian Inverse Reinforcement Learning for Large Language Model Alignment
Yuang Cai
Yuyu Yuan
Jinsheng Shi
Qinhong Lin
28
0
0
14 Nov 2024
Beyond Simple Sum of Delayed Rewards: Non-Markovian Reward Modeling for Reinforcement Learning
Yuting Tang
Xin-Qiang Cai
Jing-Cheng Pang
Qiyu Wu
Yao-Xiang Ding
Masashi Sugiyama
OffRL
21
0
0
26 Oct 2024
Evolutionary Retrofitting
Mathurin Videau
M. Zameshina
Alessandro Leite
Laurent Najman
Marc Schoenauer
O. Teytaud
28
0
0
15 Oct 2024
UNIQ: Offline Inverse Q-learning for Avoiding Undesirable Demonstrations
Huy Hoang
Tien Mai
Pradeep Varakantham
OffRL
22
0
0
10 Oct 2024
Offline Inverse Constrained Reinforcement Learning for Safe-Critical Decision Making in Healthcare
Nan Fang
Guiliang Liu
Wei Gong
OffRL
30
0
0
10 Oct 2024
Task-agnostic Pre-training and Task-guided Fine-tuning for Versatile Diffusion Planner
Chenyou Fan
Chenjia Bai
Zhao Shan
Haoran He
Yang Zhang
Zhen Wang
28
3
0
30 Sep 2024
Multi-Type Preference Learning: Empowering Preference-Based Reinforcement Learning with Equal Preferences
Z. Liu
Junjie Xu
Xingjiao Wu
J. Yang
Liang He
21
0
0
11 Sep 2024
Forward KL Regularized Preference Optimization for Aligning Diffusion Policies
Zhao Shan
Chenyou Fan
Shuang Qiu
Jiyuan Shi
Chenjia Bai
33
4
0
09 Sep 2024
Personalizing Reinforcement Learning from Human Feedback with Variational Preference Learning
S. Poddar
Yanming Wan
Hamish Ivison
Abhishek Gupta
Natasha Jaques
27
33
0
19 Aug 2024
Listwise Reward Estimation for Offline Preference-based Reinforcement Learning
Heewoong Choi
Sangwon Jung
Hongjoon Ahn
Taesup Moon
OffRL
34
2
0
08 Aug 2024
Hindsight Preference Learning for Offline Preference-based Reinforcement Learning
Chen-Xiao Gao
Shengjun Fang
Chenjun Xiao
Yang Yu
Zongzhang Zhang
OffRL
25
0
0
05 Jul 2024
Improving Sample Efficiency of Reinforcement Learning with Background Knowledge from Large Language Models
Fuxiang Zhang
Junyou Li
Yi-Chen Li
Zongzhang Zhang
Yang Yu
Deheng Ye
OffRL
KELM
39
1
0
04 Jul 2024
Preference Elicitation for Offline Reinforcement Learning
Alizée Pace
Bernhard Schölkopf
Gunnar Rätsch
Giorgia Ramponi
OffRL
50
1
0
26 Jun 2024
Order-Optimal Instance-Dependent Bounds for Offline Reinforcement Learning with Preference Feedback
Zhirui Chen
Vincent Y. F. Tan
OffRL
29
0
0
18 Jun 2024
Dishonesty in Helpful and Harmless Alignment
Youcheng Huang
Jingkun Tang
Duanyu Feng
Zheng-Wei Zhang
Wenqiang Lei
Jiancheng Lv
Anthony G. Cohn
LLMSV
25
3
0
04 Jun 2024
Preference Alignment with Flow Matching
Minu Kim
Yongsik Lee
Sehyeok Kang
Jihwan Oh
Song Chong
Seyoung Yun
19
1
0
30 May 2024
Efficient Preference-based Reinforcement Learning via Aligned Experience Estimation
Fengshuo Bai
Rui Zhao
Hongming Zhang
Sijia Cui
Ying Wen
Yaodong Yang
Bo Xu
Lei Han
OffRL
14
6
0
29 May 2024
Hindsight PRIORs for Reward Learning from Human Preferences
Mudit Verma
Katherine Metcalf
32
5
0
12 Apr 2024
Regularized Conditional Diffusion Model for Multi-Task Preference Alignment
Xudong Yu
Chenjia Bai
Haoran He
Changhong Wang
Xuelong Li
24
6
0
07 Apr 2024
Learning Human Preferences Over Robot Behavior as Soft Planning Constraints
Austin Narcomey
Nathan Tsoi
Ruta Desai
Marynel Vázquez
28
3
0
28 Mar 2024
RIME: Robust Preference-based Reinforcement Learning with Noisy Preferences
Jie Cheng
Gang Xiong
Xingyuan Dai
Q. Miao
Yisheng Lv
Fei-Yue Wang
23
14
0
27 Feb 2024
DynaMITE-RL: A Dynamic Model for Improved Temporal Meta-Reinforcement Learning
Anthony Liang
Guy Tennenholtz
Chih-Wei Hsu
Yinlam Chow
Erdem Biyik
Craig Boutilier
OffRL
29
1
0
25 Feb 2024
A Dense Reward View on Aligning Text-to-Image Diffusion with Preference
Shentao Yang
Tianqi Chen
Mingyuan Zhou
EGVM
21
22
0
13 Feb 2024
Reinforcement Learning from Bagged Reward
Yuting Tang
Xin-Qiang Cai
Yao-Xiang Ding
Qiyu Wu
Guoqing Liu
Masashi Sugiyama
OffRL
11
0
0
06 Feb 2024
Uni-RLHF: Universal Platform and Benchmark Suite for Reinforcement Learning with Diverse Human Feedback
Yifu Yuan
Jianye Hao
Yi-An Ma
Zibin Dong
Hebin Liang
Jinyi Liu
Zhixin Feng
Kai-Wen Zhao
Yan Zheng
OffRL
ALM
11
14
0
04 Feb 2024
Explore 3D Dance Generation via Reward Model from Automatically-Ranked Demonstrations
Zilin Wang
Hao-Wen Zhuang
Lu Li
Yinmin Zhang
Junjie Zhong
Jun Chen
Yu Yang
Boshi Tang
Zhiyong Wu
37
3
0
18 Dec 2023
Conditions on Preference Relations that Guarantee the Existence of Optimal Policies
Jonathan Colaco Carr
Prakash Panangaden
Doina Precup
11
1
0
03 Nov 2023
Learning to Discern: Imitating Heterogeneous Human Demonstrations with Preference and Representation Learning
Sachit Kuhar
Shuo Cheng
Shivang Chopra
Matthew Bronars
Danfei Xu
27
8
0
22 Oct 2023
Contrastive Preference Learning: Learning from Human Feedback without RL
Joey Hejna
Rafael Rafailov
Harshit S. Sikchi
Chelsea Finn
S. Niekum
W. B. Knox
Dorsa Sadigh
OffRL
9
49
0
20 Oct 2023
Safe RLHF: Safe Reinforcement Learning from Human Feedback
Josef Dai
Xuehai Pan
Ruiyang Sun
Jiaming Ji
Xinbo Xu
Mickel Liu
Yizhou Wang
Yaodong Yang
11
279
0
19 Oct 2023
Guide Your Agent with Adaptive Multimodal Rewards
Changyeon Kim
Younggyo Seo
Hao Liu
Lisa Lee
Jinwoo Shin
Honglak Lee
Kimin Lee
16
9
0
19 Sep 2023
Transformers in Reinforcement Learning: A Survey
Pranav Agarwal
A. Rahman
P. St-Charles
Simon J. D. Prince
Samira Ebrahimi Kahou
OffRL
8
18
0
12 Jul 2023
PEARL: Zero-shot Cross-task Preference Alignment and Robust Reward Learning for Robotic Manipulation
Runze Liu
Yali Du
Fengshuo Bai
Jiafei Lyu
Xiu Li
9
6
0
06 Jun 2023
Preference-grounded Token-level Guidance for Language Model Fine-tuning
Shentao Yang
Shujian Zhang
Congying Xia
Yihao Feng
Caiming Xiong
Mi Zhou
21
22
0
01 Jun 2023
Beyond Reward: Offline Preference-guided Policy Optimization
Yachen Kang
Dingxu Shi
Jinxin Liu
Li He
Donglin Wang
OffRL
17
31
0
25 May 2023
Inverse Preference Learning: Preference-based RL without a Reward Function
Joey Hejna
Dorsa Sadigh
OffRL
13
47
0
24 May 2023
Policy Representation via Diffusion Probability Model for Reinforcement Learning
Long Yang
Zhixiong Huang
Fenghao Lei
Yucun Zhong
Yiming Yang
Cong Fang
Shiting Wen
Binbin Zhou
Zhouchen Lin
DiffM
13
37
0
22 May 2023
Direct Preference-based Policy Optimization without Reward Modeling
Gaon An
Junhyeok Lee
Xingdong Zuo
Norio Kosaka
KyungHyun Kim
Hyun Oh Song
OffRL
19
24
0
30 Jan 2023
Training language models to follow instructions with human feedback
Long Ouyang
Jeff Wu
Xu Jiang
Diogo Almeida
Carroll L. Wainwright
...
Amanda Askell
Peter Welinder
Paul Christiano
Jan Leike
Ryan J. Lowe
OSLM
ALM
301
11,730
0
04 Mar 2022
Offline Reinforcement Learning with Implicit Q-Learning
Ilya Kostrikov
Ashvin Nair
Sergey Levine
OffRL
206
832
0
12 Oct 2021
1
2
Next