ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2310.05782
  4. Cited By
Aligning Language Models with Human Preferences via a Bayesian Approach

Aligning Language Models with Human Preferences via a Bayesian Approach

9 October 2023
Jiashuo Wang
Haozhao Wang
Shichao Sun
Wenjie Li
    ALM
ArXivPDFHTML

Papers citing "Aligning Language Models with Human Preferences via a Bayesian Approach"

9 / 9 papers shown
Title
Robust Reinforcement Learning from Human Feedback for Large Language Models Fine-Tuning
Robust Reinforcement Learning from Human Feedback for Large Language Models Fine-Tuning
Kai Ye
Hongyi Zhou
Jin Zhu
Francesco Quinzan
C. Shi
23
1
0
03 Apr 2025
Aligning Crowd-sourced Human Feedback for Reinforcement Learning on Code Generation by Large Language Models
Aligning Crowd-sourced Human Feedback for Reinforcement Learning on Code Generation by Large Language Models
M. Wong
C. Tan
ALM
83
4
0
19 Mar 2025
E2CL: Exploration-based Error Correction Learning for Embodied Agents
E2CL: Exploration-based Error Correction Learning for Embodied Agents
Hanlin Wang
Chak Tou Leong
Jian Wang
Wenjie Li
32
1
0
05 Sep 2024
StyEmp: Stylizing Empathetic Response Generation via Multi-Grained
  Prefix Encoder and Personality Reinforcement
StyEmp: Stylizing Empathetic Response Generation via Multi-Grained Prefix Encoder and Personality Reinforcement
Yahui Fu
Chenhui Chu
Tatsuya Kawahara
29
2
0
05 Aug 2024
Improving alignment of dialogue agents via targeted human judgements
Improving alignment of dialogue agents via targeted human judgements
Amelia Glaese
Nat McAleese
Maja Trkebacz
John Aslanides
Vlad Firoiu
...
John F. J. Mellor
Demis Hassabis
Koray Kavukcuoglu
Lisa Anne Hendricks
G. Irving
ALM
AAML
225
500
0
28 Sep 2022
Training language models to follow instructions with human feedback
Training language models to follow instructions with human feedback
Long Ouyang
Jeff Wu
Xu Jiang
Diogo Almeida
Carroll L. Wainwright
...
Amanda Askell
Peter Welinder
Paul Christiano
Jan Leike
Ryan J. Lowe
OSLM
ALM
303
11,909
0
04 Mar 2022
Trustworthy AI: From Principles to Practices
Trustworthy AI: From Principles to Practices
Bo-wen Li
Peng Qi
Bo Liu
Shuai Di
Jingen Liu
Jiquan Pei
Jinfeng Yi
Bowen Zhou
117
355
0
04 Oct 2021
Agreeing to Disagree: Annotating Offensive Language Datasets with
  Annotators' Disagreement
Agreeing to Disagree: Annotating Offensive Language Datasets with Annotators' Disagreement
Elisa Leonardelli
Stefano Menini
Alessio Palmero Aprosio
Marco Guerini
Sara Tonelli
47
97
0
28 Sep 2021
Fine-Tuning Language Models from Human Preferences
Fine-Tuning Language Models from Human Preferences
Daniel M. Ziegler
Nisan Stiennon
Jeff Wu
Tom B. Brown
Alec Radford
Dario Amodei
Paul Christiano
G. Irving
ALM
275
1,587
0
18 Sep 2019
1