Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2407.02119
Cited By
Cost-Effective Proxy Reward Model Construction with On-Policy and Active Learning
2 July 2024
Yifang Chen
Shuohang Wang
Ziyi Yang
Hiteshi Sharma
Nikos Karampatziakis
Donghan Yu
Kevin G. Jamieson
Simon Shaolei Du
Yelong Shen
OffRL
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Cost-Effective Proxy Reward Model Construction with On-Policy and Active Learning"
8 / 8 papers shown
Title
Direct Advantage Regression: Aligning LLMs with Online AI Reward
Li He
He Zhao
Stephen Wan
Dadong Wang
Lina Yao
Tongliang Liu
27
0
0
19 Apr 2025
Active Learning for Direct Preference Optimization
B. Kveton
Xintong Li
Julian McAuley
Ryan Rossi
Jingbo Shang
Junda Wu
Tong Yu
48
1
0
03 Mar 2025
Preference Elicitation for Offline Reinforcement Learning
Alizée Pace
Bernhard Schölkopf
Gunnar Rätsch
Giorgia Ramponi
OffRL
41
1
0
26 Jun 2024
Bootstrapping Language Models with DPO Implicit Rewards
Changyu Chen
Zichen Liu
Chao Du
Tianyu Pang
Qian Liu
Arunesh Sinha
Pradeep Varakantham
Min-Bin Lin
SyDa
ALM
60
22
0
14 Jun 2024
Direct Nash Optimization: Teaching Language Models to Self-Improve with General Preferences
Corby Rosset
Ching-An Cheng
Arindam Mitra
Michael Santacroce
Ahmed Hassan Awadallah
Tengyang Xie
144
113
0
04 Apr 2024
RewardBench: Evaluating Reward Models for Language Modeling
Nathan Lambert
Valentina Pyatkin
Jacob Morrison
Lester James Validad Miranda
Bill Yuchen Lin
...
Sachin Kumar
Tom Zick
Yejin Choi
Noah A. Smith
Hanna Hajishirzi
ALM
62
210
0
20 Mar 2024
Self-Rewarding Language Models
Weizhe Yuan
Richard Yuanzhe Pang
Kyunghyun Cho
Xian Li
Sainbayar Sukhbaatar
Jing Xu
Jason Weston
ReLM
SyDa
ALM
LRM
215
291
0
18 Jan 2024
Training language models to follow instructions with human feedback
Long Ouyang
Jeff Wu
Xu Jiang
Diogo Almeida
Carroll L. Wainwright
...
Amanda Askell
Peter Welinder
Paul Christiano
Jan Leike
Ryan J. Lowe
OSLM
ALM
301
11,730
0
04 Mar 2022
1