ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2402.17257
  4. Cited By
RIME: Robust Preference-based Reinforcement Learning with Noisy
  Preferences

RIME: Robust Preference-based Reinforcement Learning with Noisy Preferences

27 February 2024
Jie Cheng
Gang Xiong
Xingyuan Dai
Q. Miao
Yisheng Lv
Fei-Yue Wang
ArXivPDFHTML

Papers citing "RIME: Robust Preference-based Reinforcement Learning with Noisy Preferences"

16 / 16 papers shown
Title
TREND: Tri-teaching for Robust Preference-based Reinforcement Learning with Demonstrations
TREND: Tri-teaching for Robust Preference-based Reinforcement Learning with Demonstrations
Shuaiyi Huang
Mara Levy
Anubhav Gupta
Daniel Ekpo
Ruijie Zheng
Abhinav Shrivastava
19
0
0
09 May 2025
DAPPER: Discriminability-Aware Policy-to-Policy Preference-Based Reinforcement Learning for Query-Efficient Robot Skill Acquisition
DAPPER: Discriminability-Aware Policy-to-Policy Preference-Based Reinforcement Learning for Query-Efficient Robot Skill Acquisition
Yuki Kadokawa
Jonas Frey
Takahiro Miki
Takamitsu Matsubara
Marco Hutter
21
0
0
09 May 2025
Stop Summation: Min-Form Credit Assignment Is All Process Reward Model Needs for Reasoning
Stop Summation: Min-Form Credit Assignment Is All Process Reward Model Needs for Reasoning
Jie Cheng
Ruixi Qiao
Lijun Li
Chao Guo
J. Z. Wang
Gang Xiong
Yisheng Lv
Fei-Yue Wang
LRM
56
0
0
21 Apr 2025
VARP: Reinforcement Learning from Vision-Language Model Feedback with Agent Regularized Preferences
VARP: Reinforcement Learning from Vision-Language Model Feedback with Agent Regularized Preferences
Anukriti Singh
Amisha Bhaskar
Peihong Yu
Souradip Chakraborty
Ruthwik Dasyam
Amrit Singh Bedi
Pratap Tokekar
48
0
0
18 Mar 2025
Strategyproof Reinforcement Learning from Human Feedback
Thomas Kleine Buening
Jiarui Gan
Debmalya Mandal
Marta Z. Kwiatkowska
47
0
0
13 Mar 2025
Toward Stable World Models: Measuring and Addressing World Instability in Generative Environments
Soonwoo Kwon
Jin-Young Kim
Hyojun Go
Kyungjune Baek
53
0
0
11 Mar 2025
Skill Expansion and Composition in Parameter Space
Skill Expansion and Composition in Parameter Space
Tenglong Liu
J. Li
Yinan Zheng
Haoyi Niu
Yixing Lan
Xin Xu
Xianyuan Zhan
51
4
0
09 Feb 2025
Preference VLM: Leveraging VLMs for Scalable Preference-Based Reinforcement Learning
Preference VLM: Leveraging VLMs for Scalable Preference-Based Reinforcement Learning
Udita Ghosh
Dripta S. Raychaudhuri
Jiachen Li
Konstantinos Karydis
A. Roy-Chowdhury
VLM
55
0
0
03 Feb 2025
Ada-K Routing: Boosting the Efficiency of MoE-based LLMs
Ada-K Routing: Boosting the Efficiency of MoE-based LLMs
Tongtian Yue
Longteng Guo
Jie Cheng
Xuange Gao
J. Liu
MoE
18
0
0
14 Oct 2024
Scaling Offline Model-Based RL via Jointly-Optimized World-Action Model Pretraining
Scaling Offline Model-Based RL via Jointly-Optimized World-Action Model Pretraining
Jie Cheng
Ruixi Qiao
Gang Xiong
Binhua Li
Yingwei Ma
Binhua Li
Yongbin Li
Yisheng Lv
OffRL
OnRL
LM&Ro
32
3
0
01 Oct 2024
Robust Reinforcement Learning from Corrupted Human Feedback
Robust Reinforcement Learning from Corrupted Human Feedback
Alexander Bukharin
Ilgee Hong
Haoming Jiang
Zichong Li
Qingru Zhang
Zixuan Zhang
Tuo Zhao
26
4
0
21 Jun 2024
Leveraging Sub-Optimal Data for Human-in-the-Loop Reinforcement Learning
Leveraging Sub-Optimal Data for Human-in-the-Loop Reinforcement Learning
Calarina Muslimani
M. E. Taylor
OffRL
38
2
0
30 Apr 2024
WARM: On the Benefits of Weight Averaged Reward Models
WARM: On the Benefits of Weight Averaged Reward Models
Alexandre Ramé
Nino Vieillard
Léonard Hussenot
Robert Dadashi
Geoffrey Cideron
Olivier Bachem
Johan Ferret
100
92
0
22 Jan 2024
Training language models to follow instructions with human feedback
Training language models to follow instructions with human feedback
Long Ouyang
Jeff Wu
Xu Jiang
Diogo Almeida
Carroll L. Wainwright
...
Amanda Askell
Peter Welinder
Paul Christiano
Jan Leike
Ryan J. Lowe
OSLM
ALM
301
11,730
0
04 Mar 2022
SMARTS: Scalable Multi-Agent Reinforcement Learning Training School for
  Autonomous Driving
SMARTS: Scalable Multi-Agent Reinforcement Learning Training School for Autonomous Driving
Ming Zhou
Jun-Jie Luo
Julian Villela
Yaodong Yang
David Rusu
...
H. Ammar
Hongbo Zhang
Wulong Liu
Jianye Hao
Jun Wang
131
192
0
19 Oct 2020
Curriculum Loss: Robust Learning and Generalization against Label
  Corruption
Curriculum Loss: Robust Learning and Generalization against Label Corruption
Yueming Lyu
Ivor W. Tsang
NoLa
47
170
0
24 May 2019
1