Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2402.10038
Cited By
RS-DPO: A Hybrid Rejection Sampling and Direct Preference Optimization Method for Alignment of Large Language Models
15 February 2024
Saeed Khaki
JinJin Li
Lan Ma
Liu Yang
Prathap Ramachandra
Re-assign community
ArXiv
PDF
HTML
Papers citing
"RS-DPO: A Hybrid Rejection Sampling and Direct Preference Optimization Method for Alignment of Large Language Models"
7 / 7 papers shown
Title
CasaGPT: Cuboid Arrangement and Scene Assembly for Interior Design
Weitao Feng
Hang Zhou
Jing Liao
Li Cheng
Wenbo Zhou
3DV
58
0
0
28 Apr 2025
The Crucial Role of Samplers in Online Direct Preference Optimization
Ruizhe Shi
Runlong Zhou
Simon S. Du
53
8
0
29 Sep 2024
Cascade Reward Sampling for Efficient Decoding-Time Alignment
Bolian Li
Yifan Wang
A. Grama
Ruqi Zhang
Ruqi Zhang
AI4TS
47
9
0
24 Jun 2024
Intuitive Fine-Tuning: Towards Simplifying Alignment into a Single Process
Ermo Hua
Biqing Qi
Kaiyan Zhang
Yue Yu
Ning Ding
Xingtai Lv
Kai Tian
Bowen Zhou
35
3
0
20 May 2024
Training language models to follow instructions with human feedback
Long Ouyang
Jeff Wu
Xu Jiang
Diogo Almeida
Carroll L. Wainwright
...
Amanda Askell
Peter Welinder
Paul Christiano
Jan Leike
Ryan J. Lowe
OSLM
ALM
311
11,915
0
04 Mar 2022
Understanding Dataset Difficulty with
V
\mathcal{V}
V
-Usable Information
Kawin Ethayarajh
Yejin Choi
Swabha Swayamdipta
159
157
0
16 Oct 2021
Fine-Tuning Language Models from Human Preferences
Daniel M. Ziegler
Nisan Stiennon
Jeff Wu
Tom B. Brown
Alec Radford
Dario Amodei
Paul Christiano
G. Irving
ALM
277
1,587
0
18 Sep 2019
1