ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2403.04283
  4. Cited By
Proxy-RLHF: Decoupling Generation and Alignment in Large Language Model
  with Proxy

Proxy-RLHF: Decoupling Generation and Alignment in Large Language Model with Proxy

7 March 2024
Yu Zhu
Chuxiong Sun
Wenfei Yang
Wenqiang Wei
Bo Tang
Tianzhu Zhang
Zhiyu Li
Shifeng Zhang
Feiyu Xiong
Jie Hu
Mingchuan Yang
ArXivPDFHTML

Papers citing "Proxy-RLHF: Decoupling Generation and Alignment in Large Language Model with Proxy"

5 / 5 papers shown
Title
Systematic Evaluation of LLM-as-a-Judge in LLM Alignment Tasks: Explainable Metrics and Diverse Prompt Templates
Systematic Evaluation of LLM-as-a-Judge in LLM Alignment Tasks: Explainable Metrics and Diverse Prompt Templates
Hui Wei
Shenghua He
Tian Xia
Andy H. Wong
Jingyang Lin
Mei Han
Mei Han
ALM
ELM
54
22
0
23 Aug 2024
On the Transformations across Reward Model, Parameter Update, and
  In-Context Prompt
On the Transformations across Reward Model, Parameter Update, and In-Context Prompt
Deng Cai
Huayang Li
Tingchen Fu
Siheng Li
Weiwen Xu
...
Leyang Cui
Yan Wang
Lemao Liu
Taro Watanabe
Shuming Shi
KELM
26
2
0
24 Jun 2024
UHGEval: Benchmarking the Hallucination of Chinese Large Language Models
  via Unconstrained Generation
UHGEval: Benchmarking the Hallucination of Chinese Large Language Models via Unconstrained Generation
Xun Liang
Shichao Song
Simin Niu
Zhiyu Li
Feiyu Xiong
...
Zhaohui Wy
Dawei He
Peng Cheng
Zhonghao Wang
Haiying Deng
HILM
27
18
0
26 Nov 2023
An Empirical Survey on Long Document Summarization: Datasets, Models and
  Metrics
An Empirical Survey on Long Document Summarization: Datasets, Models and Metrics
Huan Yee Koh
Jiaxin Ju
Ming Liu
Shirui Pan
73
120
0
03 Jul 2022
Training language models to follow instructions with human feedback
Training language models to follow instructions with human feedback
Long Ouyang
Jeff Wu
Xu Jiang
Diogo Almeida
Carroll L. Wainwright
...
Amanda Askell
Peter Welinder
Paul Christiano
Jan Leike
Ryan J. Lowe
OSLM
ALM
301
11,730
0
04 Mar 2022
1