ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2206.13316
  4. Cited By
Humans are not Boltzmann Distributions: Challenges and Opportunities for
  Modelling Human Feedback and Interaction in Reinforcement Learning

Humans are not Boltzmann Distributions: Challenges and Opportunities for Modelling Human Feedback and Interaction in Reinforcement Learning

27 June 2022
David Lindner
Mennatallah El-Assady
    OffRL
ArXivPDFHTML

Papers citing "Humans are not Boltzmann Distributions: Challenges and Opportunities for Modelling Human Feedback and Interaction in Reinforcement Learning"

12 / 12 papers shown
Title
Robust Reinforcement Learning from Human Feedback for Large Language Models Fine-Tuning
Robust Reinforcement Learning from Human Feedback for Large Language Models Fine-Tuning
Kai Ye
Hongyi Zhou
Jin Zhu
Francesco Quinzan
C. Shi
23
0
0
03 Apr 2025
On the Effect of Robot Errors on Human Teaching Dynamics
On the Effect of Robot Errors on Human Teaching Dynamics
Jindan Huang
Isaac S. Sheidlower
Reuben M. Aronson
E. Short
28
0
0
15 Sep 2024
Mutual Theory of Mind in Human-AI Collaboration: An Empirical Study with
  LLM-driven AI Agents in a Real-time Shared Workspace Task
Mutual Theory of Mind in Human-AI Collaboration: An Empirical Study with LLM-driven AI Agents in a Real-time Shared Workspace Task
Shao Zhang
Xihuai Wang
Wenhao Zhang
Yongshan Chen
Landi Gao
Dakuo Wang
Weinan Zhang
Xinbing Wang
Ying Wen
LLMAG
40
9
0
13 Sep 2024
Towards Trustworthy AI: A Review of Ethical and Robust Large Language
  Models
Towards Trustworthy AI: A Review of Ethical and Robust Large Language Models
Meftahul Ferdaus
Mahdi Abdelguerfi
Elias Ioup
Kendall N. Niles
Ken Pathak
Steve Sloan
32
10
0
01 Jun 2024
Direct Preference Optimization With Unobserved Preference Heterogeneity
Direct Preference Optimization With Unobserved Preference Heterogeneity
Keertana Chidambaram
Karthik Vinay Seetharaman
Vasilis Syrgkanis
39
7
0
23 May 2024
Impact of Preference Noise on the Alignment Performance of Generative
  Language Models
Impact of Preference Noise on the Alignment Performance of Generative Language Models
Yang Gao
Dana Alon
Donald Metzler
21
15
0
15 Apr 2024
Uni-RLHF: Universal Platform and Benchmark Suite for Reinforcement
  Learning with Diverse Human Feedback
Uni-RLHF: Universal Platform and Benchmark Suite for Reinforcement Learning with Diverse Human Feedback
Yifu Yuan
Jianye Hao
Yi-An Ma
Zibin Dong
Hebin Liang
Jinyi Liu
Zhixin Feng
Kai-Wen Zhao
Yan Zheng
OffRL
ALM
16
14
0
04 Feb 2024
Towards Understanding Sycophancy in Language Models
Towards Understanding Sycophancy in Language Models
Mrinank Sharma
Meg Tong
Tomasz Korbak
D. Duvenaud
Amanda Askell
...
Oliver Rausch
Nicholas Schiefer
Da Yan
Miranda Zhang
Ethan Perez
209
178
0
20 Oct 2023
RLHF-Blender: A Configurable Interactive Interface for Learning from
  Diverse Human Feedback
RLHF-Blender: A Configurable Interactive Interface for Learning from Diverse Human Feedback
Yannick Metz
David Lindner
Raphael Baur
Daniel A. Keim
Mennatallah El-Assady
AI4CE
32
10
0
08 Aug 2023
Open Problems and Fundamental Limitations of Reinforcement Learning from
  Human Feedback
Open Problems and Fundamental Limitations of Reinforcement Learning from Human Feedback
Stephen Casper
Xander Davies
Claudia Shi
T. Gilbert
Jérémy Scheurer
...
Erdem Biyik
Anca Dragan
David M. Krueger
Dorsa Sadigh
Dylan Hadfield-Menell
ALM
OffRL
34
468
0
27 Jul 2023
Explore, Establish, Exploit: Red Teaming Language Models from Scratch
Explore, Establish, Exploit: Red Teaming Language Models from Scratch
Stephen Casper
Jason Lin
Joe Kwon
Gatlen Culp
Dylan Hadfield-Menell
AAML
8
83
0
15 Jun 2023
Reward (Mis)design for Autonomous Driving
Reward (Mis)design for Autonomous Driving
W. B. Knox
A. Allievi
Holger Banzhaf
Felix Schmitt
Peter Stone
67
112
0
28 Apr 2021
1