Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2310.13639
Cited By
Contrastive Preference Learning: Learning from Human Feedback without RL
20 October 2023
Joey Hejna
Rafael Rafailov
Harshit S. Sikchi
Chelsea Finn
S. Niekum
W. B. Knox
Dorsa Sadigh
OffRL
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Contrastive Preference Learning: Learning from Human Feedback without RL"
16 / 16 papers shown
Title
Optimal Interactive Learning on the Job via Facility Location Planning
Shivam Vats
Michelle Zhao
Patrick Callaghan
Mingxi Jia
Maxim Likhachev
Oliver Kroemer
George Konidaris
29
0
0
01 May 2025
Inverse-RLignment: Large Language Model Alignment from Demonstrations through Inverse Reinforcement Learning
Hao Sun
M. Schaar
82
14
0
28 Jan 2025
Understanding Layer Significance in LLM Alignment
Guangyuan Shi
Zexin Lu
Xiaoyu Dong
Wenlong Zhang
Xuanyu Zhang
Yujie Feng
Xiao-Ming Wu
45
2
0
23 Oct 2024
DeformPAM: Data-Efficient Learning for Long-horizon Deformable Object Manipulation via Preference-based Action Alignment
Wendi Chen
Han Xue
Fangyuan Zhou
Yuan Fang
Cewu Lu
39
0
0
15 Oct 2024
Humor in AI: Massive Scale Crowd-Sourced Preferences and Benchmarks for Cartoon Captioning
Jifan Zhang
Lalit P. Jain
Yang Guo
Jiayi Chen
Kuan Lok Zhou
...
Scott Sievert
Timothy Rogers
Kevin Jamieson
Robert Mankoff
Robert Nowak
29
5
0
15 Jun 2024
(Perhaps) Beyond Human Translation: Harnessing Multi-Agent Collaboration for Translating Ultra-Long Literary Texts
Minghao Wu
Jiahao Xu
Yulin Yuan
Gholamreza Haffari
Longyue Wang
Weihua Luo
Kaifu Zhang
LLMAG
114
22
0
20 May 2024
Heterogeneous Contrastive Learning for Foundation Models and Beyond
Lecheng Zheng
Baoyu Jing
Zihao Li
Hanghang Tong
Jingrui He
VLM
24
18
0
30 Mar 2024
On the Essence and Prospect: An Investigation of Alignment Approaches for Big Models
Xinpeng Wang
Shitong Duan
Xiaoyuan Yi
Jing Yao
Shanlin Zhou
Zhihua Wei
Peng Zhang
Dongkuan Xu
Maosong Sun
Xing Xie
OffRL
33
16
0
07 Mar 2024
YODA: Teacher-Student Progressive Learning for Language Models
Jianqiao Lu
Wanjun Zhong
Yufei Wang
Zhijiang Guo
Qi Zhu
...
Baojun Wang
Yasheng Wang
Lifeng Shang
Xin Jiang
Qun Liu
LRM
14
6
0
28 Jan 2024
A Minimaximalist Approach to Reinforcement Learning from Human Feedback
Gokul Swamy
Christoph Dann
Rahul Kidambi
Zhiwei Steven Wu
Alekh Agarwal
OffRL
28
94
0
08 Jan 2024
Distance Weighted Supervised Learning for Offline Interaction Data
Joey Hejna
Jensen Gao
Dorsa Sadigh
OffRL
31
12
0
26 Apr 2023
Training language models to follow instructions with human feedback
Long Ouyang
Jeff Wu
Xu Jiang
Diogo Almeida
Carroll L. Wainwright
...
Amanda Askell
Peter Welinder
Paul Christiano
Jan Leike
Ryan J. Lowe
OSLM
ALM
303
11,730
0
04 Mar 2022
Offline Reinforcement Learning with Implicit Q-Learning
Ilya Kostrikov
Ashvin Nair
Sergey Levine
OffRL
206
832
0
12 Oct 2021
What Matters in Learning from Offline Human Demonstrations for Robot Manipulation
Ajay Mandlekar
Danfei Xu
J. Wong
Soroush Nasiriany
Chen Wang
Rohun Kulkarni
Li Fei-Fei
Silvio Savarese
Yuke Zhu
Roberto Martín-Martín
OffRL
147
461
0
06 Aug 2021
Offline Reinforcement Learning: Tutorial, Review, and Perspectives on Open Problems
Sergey Levine
Aviral Kumar
George Tucker
Justin Fu
OffRL
GP
329
1,944
0
04 May 2020
Fine-Tuning Language Models from Human Preferences
Daniel M. Ziegler
Nisan Stiennon
Jeff Wu
Tom B. Brown
Alec Radford
Dario Amodei
Paul Christiano
G. Irving
ALM
275
1,561
0
18 Sep 2019
1