ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2310.07123
  4. Cited By
Off-Policy Evaluation for Human Feedback

Off-Policy Evaluation for Human Feedback

11 October 2023
Qitong Gao
Ge Gao
Juncheng Dong
Vahid Tarokh
Min Chi
Miroslav Pajic
    OffRL
ArXivPDFHTML

Papers citing "Off-Policy Evaluation for Human Feedback"

6 / 6 papers shown
Title
OCEAN: Offline Chain-of-thought Evaluation and Alignment in Large
  Language Models
OCEAN: Offline Chain-of-thought Evaluation and Alignment in Large Language Models
Junda Wu
Xintong Li
Ruoyu Wang
Yu Xia
Yuxin Xiong
...
Xiang Chen
B. Kveton
Lina Yao
Jingbo Shang
Julian McAuley
OffRL
LRM
29
0
0
31 Oct 2024
Beyond Simple Sum of Delayed Rewards: Non-Markovian Reward Modeling for
  Reinforcement Learning
Beyond Simple Sum of Delayed Rewards: Non-Markovian Reward Modeling for Reinforcement Learning
Yuting Tang
Xin-Qiang Cai
Jing-Cheng Pang
Qiyu Wu
Yao-Xiang Ding
Masashi Sugiyama
OffRL
26
0
0
26 Oct 2024
Off-Policy Selection for Initiating Human-Centric Experimental Design
Off-Policy Selection for Initiating Human-Centric Experimental Design
Ge Gao
Xi Yang
Qitong Gao
Song Ju
Miroslav Pajic
Min Chi
OffRL
26
0
0
26 Oct 2024
Offline RL for Natural Language Generation with Implicit Language Q
  Learning
Offline RL for Natural Language Generation with Implicit Language Q Learning
Charles Burton Snell
Ilya Kostrikov
Yi Su
Mengjiao Yang
Sergey Levine
OffRL
121
101
0
05 Jun 2022
COMBO: Conservative Offline Model-Based Policy Optimization
COMBO: Conservative Offline Model-Based Policy Optimization
Tianhe Yu
Aviral Kumar
Rafael Rafailov
Aravind Rajeswaran
Sergey Levine
Chelsea Finn
OffRL
214
413
0
16 Feb 2021
Fine-Tuning Language Models from Human Preferences
Fine-Tuning Language Models from Human Preferences
Daniel M. Ziegler
Nisan Stiennon
Jeff Wu
Tom B. Brown
Alec Radford
Dario Amodei
Paul Christiano
G. Irving
ALM
275
1,583
0
18 Sep 2019
1