ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2405.11204
  4. Cited By
Learning from Imperfect Human Feedback: a Tale from Corruption-Robust
  Dueling

Learning from Imperfect Human Feedback: a Tale from Corruption-Robust Dueling

18 May 2024
Yuwei Cheng
Fan Yao
Xuefeng Liu
Haifeng Xu
ArXivPDFHTML

Papers citing "Learning from Imperfect Human Feedback: a Tale from Corruption-Robust Dueling"

2 / 2 papers shown
Title
Robust Reinforcement Learning from Human Feedback for Large Language Models Fine-Tuning
Robust Reinforcement Learning from Human Feedback for Large Language Models Fine-Tuning
Kai Ye
Hongyi Zhou
Jin Zhu
Francesco Quinzan
C. Shi
20
0
0
03 Apr 2025
Nearly Optimal Algorithms for Linear Contextual Bandits with Adversarial
  Corruptions
Nearly Optimal Algorithms for Linear Contextual Bandits with Adversarial Corruptions
Jiafan He
Dongruo Zhou
Tong Zhang
Quanquan Gu
61
46
0
13 May 2022
1