Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2402.16030
Cited By
Don't Forget Your Reward Values: Language Model Alignment via Value-based Calibration
25 February 2024
Xin Mao
Fengming Li
Huimin Xu
Wei Zhang
A. Luu
ALM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Don't Forget Your Reward Values: Language Model Alignment via Value-based Calibration"
7 / 7 papers shown
Title
Lean and Mean: Decoupled Value Policy Optimization with Global Value Guidance
Chenghua Huang
Lu Wang
Fangkai Yang
Pu Zhao
Z. Li
Qingwei Lin
Dongmei Zhang
Saravan Rajmohan
Qi Zhang
OffRL
45
1
0
24 Feb 2025
Towards a Unified View of Preference Learning for Large Language Models: A Survey
Bofei Gao
Feifan Song
Yibo Miao
Zefan Cai
Z. Yang
...
Houfeng Wang
Zhifang Sui
Peiyi Wang
Baobao Chang
Baobao Chang
41
11
0
04 Sep 2024
A Survey on Human Preference Learning for Large Language Models
Ruili Jiang
Kehai Chen
Xuefeng Bai
Zhixuan He
Juntao Li
Muyun Yang
Tiejun Zhao
Liqiang Nie
Min Zhang
39
8
0
17 Jun 2024
Robust Preference Optimization through Reward Model Distillation
Adam Fisch
Jacob Eisenstein
Vicky Zayats
Alekh Agarwal
Ahmad Beirami
Chirag Nagpal
Peter Shaw
Jonathan Berant
68
20
0
29 May 2024
AlphaMath Almost Zero: process Supervision without process
Guoxin Chen
Minpeng Liao
Chengxi Li
Kai Fan
AIMat
LRM
27
11
0
06 May 2024
REBEL: Reinforcement Learning via Regressing Relative Rewards
Zhaolin Gao
Jonathan D. Chang
Wenhao Zhan
Owen Oertell
Gokul Swamy
Kianté Brantley
Thorsten Joachims
J. Andrew Bagnell
Jason D. Lee
Wen Sun
OffRL
23
31
0
25 Apr 2024
Training language models to follow instructions with human feedback
Long Ouyang
Jeff Wu
Xu Jiang
Diogo Almeida
Carroll L. Wainwright
...
Amanda Askell
Peter Welinder
Paul Christiano
Jan Leike
Ryan J. Lowe
OSLM
ALM
301
11,730
0
04 Mar 2022
1